Multi-machine environments running GrISU can produce large simulation databases in a reasonable amount of time. However, organizing the many output files manually is a daunting task at best. The run_sim perl script provided in the GrISU package provides this organization with a minimum of work, other than setting up the initial pilot files and the runsim.pilot file. It may be used on Linux clusters or on Beowulf clusters with NFS-mounted file systems. The run_sim script:
The remainder of this page documents the use of the run_sim script, both for an OSLAF-like cluster and a standard Linux network. In the following text, I refer to the Linux server or the OSLAF master node as the "master node" and to an individual machine in the network or to an OSLAF slave node as "slave node" or just "node".
In producing MC simulations using GrISU on a cluster of computers, each node acts as an independent processor. Since the master node is NFS mounted on each slave node (see following diagram), each slave node has access to files on the master node, and there is no need to transfer code or pilot files to the slave nodes. However, each slave node must use its own pilot files ( at least with different random number seeds) and must store its output files in an appropriate location either on the master node or on the slave node.
The following diagram shows a typical setup for producing
simulations. Each node has its own "GRISU" directory, in this case
GrISU/0 and GrISU/1 on the master node. By establishing appropriate links to
GrISU directories and and creating the Data and Dump subdirectories, GrISU/0
and GrISU/1 can each become a "GrISU" directory for the
two nodes respectively.
Using the run_sim script
I'll start from the beginning for completeness.
a. Unpack GrISU in your home directory on the master node (either the server or the OSLAF master node). Doing so will create all of the GrISU directories including the main directory GrISU.
b. Above the main GrISU directory, create directories GrISU/0, GrISU/1, ... GrISU/9. (or use directory names appropriate for networked machines) for each machine. These directories are on the server or OSLAF master node. At Grinnell I use node names, grin1, grin2, grin3,...
c. Copy runsim.pilot from GrISU/Utilities/Runsim to GrISU.
d. Copy the perl script, run_sim from GrISU/Utilities/Runsim to GrISU.
e. After modifying runsim.pilot (more about this later), copy runsim.pilot from GrISU/ to each of the 10 node directories.
f. Modify the pilot files in GrISU according to your wishes.
g. Start the simulations on a node "cd GrISU/node_number"
and "../run_sim &". The perl script, run_sim, does
OSLAF: simply "cd GrISU/node_number" and execute "../run_sim &" from the master node.
Linux Cluster: ssh to the node machine, cd to GrISU/node_name, execute "../run_sim &".
I recommend using the unix "screen" command for
each node. This enables you to log out of OSLAF
or your networked machine without stopping your simulation run.
1. Type "screen"
2. "../run_sim &
3. "ctrl-A-ctrl-D" hides the screen. On you can create another screen for another node. Later to activate the screen, type "screen -r <screen number>. If you have created more than one screen, "screen -r" will give you a list of screen numbers. Also, see "man screen".
Or you may prefer to use "vcr" which is a normally a bit slow...
You'll notice in the Runsim directory two additional files, run_grisudet and rungrisudet.pilot. If you ever wish to run a series of photon.cph files through grisudet and following codes, then you'll need the run_grisudet script since this script automates everything you do. You'll find the documentation later on this page.
Example runsim.pilot File
"Pilot_runsim.txt" file for perl script "run_sim".
"run_sim" must be executed from a directory on top of
GrISU, e.g. GrISU/0.
Data lines begin with an asterisk, otherwise ignored.
Number of runs, default 1
Execute runs on Oslaf node (1), default 0
The script always runs kascade and cherenk7. The kascade.pilot
must call for stdout
Run grisudet (1), default 0
Run analysis (1), default 0
Run cutspec (1), default 0
Copy kascade.pilot from GrISU directory (1), default
You must first run the "randomlist" code in
gzip cph files (1), default 0 (do not compress)
Specify storage directory, default, don't move output
Pilot File Comments (Answers to FAQ's)
A. Each machine has a directory GrISU/node, where "node" is the name of the machine. There is a separate node for each machine. When running on a Beowulf cluster, the name of the node is the number of the Beowulf node. For the ISU cluster, there is a number for each processor: "0" and "00" for slave machine 0, "1" and "10" for slave machine 2, etc. I only use the single digit processors since the OSLAF software will run kascade on one processor and cherenkf7 on the second processor.
B. Each GrISU/node directory will act as the home GrISU
directory for that
machine. The run_sim perl script sets up the links to GrISU/Config,
C. In the pilot_runsim.txt file, you'll see options to:
select the number of runs where each run is a repeat with different random number seeds of the showers defined in kascade.pilot and cherenkov.pilot, etc. For example, you could set up 20 runs wherethe files from each run would contain 50,000 showers.
choose to execute grisudet, analysis, and/or cutspec). (cutspec performs preliminary cutting and sets up the energies.log and mcarlo.log files for the energy spectra analysis). Kascade and chernkov will always execute. Be sure to set up kascade and cherenkov so that the output of kascade will pipe into cherenkov.
copy the pilot files from GrISU to GrISU/node. You may have already
copied these pilot files and set their parameters appropriate for
this node. In this case, run_sim should not copy these files from
change the random number seeds in kascade.pilot, cherenkov.pilot, and
in detector.pilot prior to running kascade, cherenkf, and grisudet. The
perl script draws random numbers from the randomnumber.txt list in
move all files to a single storage directory.
gzip the photon file, e.g. photon.cph.
D. At the end of each run, the script tags each file with
a node number and a run number. For example, after completing run 2 on node
5, the photon.cph cherenkf photon file will be named node5.run2.photon.cph (the
node can have any alphanumeric name). All output files will accumulate in the
directories specified in the pilot files. If a storage directory is given
in runsim.pilot, then at the conclusion of each run, all output files will be moved to this directory. This option permits the generation of files on a local disk before moving the output files to a disk elsewhere on the network. At Grinnell this is important as it avoids serious network traffic if all nodes were using a common storage directory during shower
generation. On a Beowulf cluster, this option may be similarly used to move the files to a common NSF mounted node, e.g. the master node, at the conclusion of each run.
E. The script contains an option for running on OSCAR, the
cluster. This cluster uses the OSCAR system and the run_sim script uses the
OSCAR "cexec :node" command. Thus, to use OSLAF and similar systems, the
master node must be NSF mounted on each slave node.
Each processor has a node number, e.g. slave machine 1 has processor numbers 1 and 01. Thus, from GrISU set up a node for each processor you plan to use, e.g. GrISU/1 and GrISU/01 for machine 1, and likewise for machines 0 through 9.
Set up the pilot files in each node directory, choosing the OSLAF option. Then, execute the run_sim script (../run_sim ) within each directory on the master node. The script executes the programs and manages the output files on each node - it determines the node from the node directory name. Notice that you move no files to the slave node disks since the master node is NSF mounted on each slave machine.
F. On OSLAF, I set up a screen (using the UNIX "screen" command) for each node directory. I then start run_sim from within that screen window. Ctrl-A Ctrl-D will place the window in background and you can proceed to the next window. When you log out, all screens are still active and you can later return to each screen (or window) with the "screen -r" command. You can check out "man screen" for more details. Other use the "vnc" software; I've found this is very slow for the connection between Grinnell and Ames. As yet, I have not been able to use "nohup ../run_sim" without problems that I don't understand.
Should you wish to run a series of photon.cph files through grisudet and following codes, you'll need to use run_grisudet. This perl script used the pilot file, pilot_rungrisudet. The operation is very similar to that of run_sim, requiring that you use node directories, such as GrISU/node. The instructions are included in the example pilot file in Utilities/Runsim and shown below.
Example rungrisudet.pilot pilotfile
|Pilot_rungrisudet.txt file to control perl script run_grisudet. This
script executes grisudet and following codes for a series of photon.cph
grisudet input files.
Copy this pilot file to a node directory, e.g. GrISU/node; copy run_grisudet
to GrISU (run_grisudet is stored in GrISU/Utilities/Runsim. Execute the
Output files are given the same node and run designation as the cph
data lines begin with an asterisk, otherwise ignored
Execute runs on Oslaf node (1), default 0
list of photon.cph files to run through grisudet, etc. there must
Run analysis (1), default 0
Run cutspec (1), default 0
Copy detector.pilot from GrISU directory (1), default 0
Copy analyze.pilot from GrISU directory (1), default 0
Copy cutspec.pilot from GrISU directory (1), default 0
Change detector.pilot seed (1), default 0
Specify storage directory, default, don't move output files. The