PHYCS 3730/5720
Assignment 8

Fall Semester 2000


Home | Announcements | Lecture Plan | Assignments | Exercises | References | Web Resources

Exercise 1.

This is an exercise with Monte Carlo integration. What is the volume of a 5-dimensional sphere of unit radius?

Write a C code called mcint.cc that operates as follows:

   darkstar> mcint 10000 987654321
   5-D unit sphere volume: XXXX
   uncertainty sigma_I: YYYY
The output XXXX should be your answer obtained from simple Monte Carlo integration with 10000 random sample points (specified by the first command-line argument) using the drand() random number generator (RNG) seeded (by sdrand()) with the value given in the second command-line argument. The sigma_I value YYYY should be an estimate of the error in your answer as discussed in lecture and in Numerical Recipes in C, section 7.6.

As in Assignment 7, Exercise 1, use a makefile, ~p5720/makefiles/a08.mk, to access function prototypes in in "p5720.h" and definitions in the p5720 class library. A makefile ~p5720/makefiles/a08.mk can help with compilation and linking to the class library.

Hints:

  • A unit sphere in n-dimensions cannot fit into an n-D unit cube. But you can take advantage of the spherical symmetry of the problem and work with a unit square/cube/hypercube that lies in a single quadrant (2-D), octant (3-D), or whatever these generalize to in arbitrary n-D.

  • You may wish to define a class "rndvec5d" which simply has a constructor that sets all five components of your vector using drand(), and has a member function which gives the norm of the vector.

    Turn in your code mcint.cc


    Exercise 2.

    This problem deals with data compresibility as a measure of information in a stream of bits. In the directory ~p5720/data you will find the following files, all of exactly the same length:

    The relevant differences between these files is the amount of information they carry. The first file is just a stream of random bits. This could, in principle, correspond to a large amount of unique information -- it has no "extra fluff" in terms of repeated patterns -- but we do not have a "key" to unlock this information. The second file, "randc.txt" is also random, but not in all bits -- only those bits used to represent the keyboard characters are used. The third file is extremely high-quality English text, and the last makes for much less interesting reading. (It is based on Stanley Kubrick's film "the Shining", in which an author, played by Jack Nicholson, goes beserk [that's the executive summary]. In one scene, the author works furiously at the typewriter, night and day. The result of all this effort is a ream of paper with the same sentence written over and over and over...as in "jack.txt".)

    Use the utility "gzip" and the shell command "ls -l" to determine the degree of compression (or information content) in these files.

    In a text file info.txt, type in the resulting length of each file using the format

         randb.bin.gz: ###### bytes
         randc.txt.gz: ###### bytes
         (etc.)
    
    Following this list, write a short paragraph discussing gzip's ability to the compress these files.

    Submit your info.txt file. (All the "original" data files will remain available in the class directories, so please erase your copies of all the lengthy ones -- thanks!)


    Exercise 3.

    This problem, of a sociological nature, involves linking C++ code with C and Fortran routines.

    In U.S. presidential elections, the President is not chosen by popular vote. Instead, a candidate must win a majority of only 538 "electoral votes"; each State gets to provide electoral votes equal to the number its of U.S. Senators and Representatives (e.g., Utah gets 5 electoral votes). In all but two States (Maine and Nebraska), the winner of a State's popular vote gets all of its electoral votes. For example, if two candidates in California got 50.01% and 49.99% of the popular vote respectively, the winner gets all of California's 54 electoral votes. Owing to "discretization noise" in the process of converting individual States' popular votes into a small number of electoral votes, it is actually possible for the winner of the national popular vote to actually lose the election. This has happened twice in the history of the U.S. (Please see the National Archives and Records website for more information.) This situation is even more likely to occur when there are more than two dominant political parties.

    In this exercise, you are to write a single C++ code, vote.cc, to calculate the answer to these two questions:

    To answer these questions, use codes in ~p5720/src called gerf.f (Fortran 77 code) and binom.c (C code). The first of these calculates the integral

           Gerf(x) = Integrate[t=0 to t=x] dt exp(-(t^2)/2)/sqrt(2*pi)
    
    The second calulates the natural log of the binomial coefficient binom(n,k). With the approriate change of variables, the gerf.f code can answer the first question, while the binom.c code can help answer the second. Note that with the Binomial distribution, the numbers involved are typically so large (e.g., 0.5^500000= ~10^-150000) that your machine cannot represent them. Thus work with natural logarithms and convert to back to standard double precision numbers only at the end of the calculation.

    (For more information about the statistics involved here, please check out this Statistics Page to get some basic information on the Binomial and Gaussian (or "Normal") distribution. This site from Mathematica is unfortunately shut down for legal reasons but when it comes back up it will be a good resource.... If you find a good statistics website please let the instructor know -- thanks!)

    At the heart of this problem is not statistical analysis, but linking Fortran, C, and C++ codes together. This Lab Exercise on linking will help you figure out how to do this.

    Turn in only your code vote.cc. Be sure the answers to the two problems are clearly printed out when your code is executed.


    Exercise 4.

    The C code binom.c in the ~p5720/src directory was hacked together by the instructor in 10 minutes. It is not very well designed; in particular log_binom_coeff(n,k) fails to use nice things like the Stirling approximation (e.g., log(n!) -> n*log(n)) in the large n limit, or the fact that in this limit the Binomial distribution is well approximated by a Guassian.

    Starting with NIST's GAMS website, find a code (C or Fortran) which provides binomial coefficients in double precision (most [all?] codes provided by GAMS give answers accurate to machine precision).

    Turn in a text file called binom.txt with the URL (http://...) containing a link to downloadable source code that calculates binomial coefficients in double precision.


    Exercise 5.

    This exercise is a plea for help from the instructor. A student in this course was running the code for Assignment 7, Exercise 1, and discovered that the drand() random number generator actually gave a double precision number that was less than 10^-16. The likelihood that this can happen is very small. (If we ran drand in an infinite loop, generating a million numbers a second [a modern workstation can do this easily] then we should get such a small number on average every thousand years or so.)

    Is the drand() code broken?

    To help answer this question, do two things:

  • Run the executable code ~p5720/bin/rndexp_test as follows:
        nice +10 ~p5720/bin/rndexp_test NUMBER SEED >&! rndexp_test.log &
    
    where NUMBER is a pretty big integer, say 1000000 to 1000000000 (a million to a billion), and SEED is a seed for the drand() random number generator. Comments:

    The code can generate about a million numbers per second, so use this rate to pick the argument NUMBER in the command line, depending on how much time you want to wait around for the code to run. Also, it would be good to use a SEED that was different from everyone else's in the class. How about picking a seed from your Utah Driver's License number (but don't use your SSN!)? Or some scrambled form of your UofU ID NUmber?

    Ultimately, you will submit the rndexp_test.log file, but meanwhile, run the code as shown above and move on to the next part of the assignment.

  • Run either your version of rndexp from Assignment 7, Exercise 1 (or the executable ~p5720/bin/rndexp), so that you create a text file with 20000 exponentially distribute random numbers with a unit mean (tau=1). Use "sm" (or "gnuplot" if you want) to create a histogram of your random numbers, and plot the underlying distribution function, p(x)=exp(-x), on top of the histogram. The idea here is we'd like to check that the exponential RNG is actually spitting out exponentially distributed numbers. (We should really do a more direct check on drand's ability to make uniform random numbers -- oh well.) If there is a horrid mismatch between the predicted distribution and the histogram, there is a big problem.

    Store the commands used to create your histogram in a script, rndexp.sm or .gpl) for example.

    Hint: When building the histogram and comparing with an exponential, you must either normalize the histogram or scale your exponential by a factor of N=20000*delta_x. Either way is Ok.

    Also, for a new and better (?) way of getting a .jpg file out of sm, try the instructions posted at this link.

    Note: There will be some scatter, or error, in the histogram relative to the exponential, and this is Ok even for a good RNG. The thing to check is that the scatter is not much bigger than the "discreteness noise" from using finite histogram bin sizes and a limited number of sample points. An estimate the standard deviation (characteristics size) of the error is sqrt(N_i), where N_i is the number of random numbers in the i^th bin. In this exercise, you do not have to provide an estimate of the errors.

    Submit the output of your rndexp_test run rndexp_test.log, along with your plotting script (e.g. rdnexp.sm) and histogram plot in jpeg form, exphist.jpg, as in Assignment 4, Exercise 3.


    bcb 15-Oct-99.