Running MC Simulations using GrISU on OSLAF
or on Machines on a Standard Linux Network

Multi-machine environments running GrISU can produce large simulation databases in a reasonable amount of time. However, organizing the many output files manually is a daunting task at best. The run_sim perl script provided in the GrISU package provides this organization with a minimum of work, other than setting up the initial pilot files and the runsim.pilot file. It may be used on Linux clusters or on Beowulf clusters with NFS-mounted file systems. The run_sim script:

The remainder of this page documents the use of the run_sim script, both for an OSLAF-like cluster and a standard Linux network. In the following text, I refer to the Linux server or the OSLAF master node as the "master node" and to an individual machine in the network or to an OSLAF slave node as "slave node" or just "node".

In producing MC simulations using GrISU on a cluster of computers, each node acts as an independent processor. Since the master node is NFS mounted on each slave node (see following diagram), each slave node has access to files on the master node, and there is no need to transfer code or pilot files to the slave nodes. However, each slave node must use its own pilot files ( at least with different random number seeds) and must store its output files in an appropriate location either on the master node or on the slave node.

The following diagram shows a typical setup for producing simulations. Each node has its own "GRISU" directory, in this case GrISU/0 and GrISU/1 on the master node. By establishing appropriate links to GrISU directories and and creating the Data and Dump subdirectories, GrISU/0 and GrISU/1 can each become a "GrISU" directory for the
two nodes respectively.

 

Using the run_sim script

I'll start from the beginning for completeness.

a. Unpack GrISU in your home directory on the master node (either the server or the OSLAF master node). Doing so will create all of the GrISU directories including the main directory GrISU.

b. Above the main GrISU directory, create directories GrISU/0, GrISU/1, ... GrISU/9. (or use directory names appropriate for networked machines) for each machine. These directories are on the server or OSLAF master node. At Grinnell I use node names, grin1, grin2, grin3,...

c. Copy runsim.pilot from GrISU/Utilities/Runsim to GrISU.

d. Copy the perl script, run_sim from GrISU/Utilities/Runsim to GrISU.

e. After modifying runsim.pilot (more about this later), copy runsim.pilot from GrISU/ to each of the 10 node directories.

f. Modify the pilot files in GrISU according to your wishes.

g. Start the simulations on a node "cd GrISU/node_number" and "../run_sim &". The perl script, run_sim, does
everything.

OSLAF: simply "cd GrISU/node_number" and execute "../run_sim &" from the master node.

Linux Cluster: ssh to the node machine, cd to GrISU/node_name, execute "../run_sim &".

I recommend using the unix "screen" command for each node. This enables you to log out of OSLAF
or your networked machine without stopping your simulation run.

1. Type "screen"

2. "../run_sim &

3. "ctrl-A-ctrl-D" hides the screen. On you can create another screen for another node. Later to activate the screen, type "screen -r <screen number>. If you have created more than one screen, "screen -r" will give you a list of screen numbers. Also, see "man screen".

Or you may prefer to use "vcr" which is a normally a bit slow...

You'll notice in the Runsim directory two additional files, run_grisudet and rungrisudet.pilot. If you ever wish to run a series of photon.cph files through grisudet and following codes, then you'll need the run_grisudet script since this script automates everything you do. You'll find the documentation later on this page.

Example runsim.pilot File

"Pilot_runsim.txt" file for perl script "run_sim".

"run_sim" must be executed from a directory on top of GrISU, e.g. GrISU/0.
The script identifies the node number, in this example "0". For runs on OSLAF,
the script will execute simulations on the correct node, in this example node 0. For
simulations on a Linux network, the script will execute simulations on the computer
currently in use and will identify files with a node name. For example, at Grinnell, I can
ssh into computer "grin1", move to directory GrISU/grin1 and execute ../run_sim. The
script set ups up GrISU/grin1 as the "GrISU" directory for grin1and adds nodenames
and run numbers to the output files, e.g. "nodegrin1.run1.photon.cph". The code executes
on the local machine. Output files may be stored on either the local machine or the
master node depending on directories specified in the pilot files.

Data lines begin with an asterisk, otherwise ignored.
Default value: leave out the * to produce a comment line rather than a data line.

Number of runs, default 1
Each run repeats the simulation using the same pilot files but possibly with
different random-number seeds (see below)
* NUMBR 1

Execute runs on Oslaf node (1), default 0
The run_sim script also applies to a normal Linux network environment
where a server disk is NSF mounted on each machine. Then, you "ssh"
to each machine and start "../run_sim" from the GrISU/machine directory.
In this case remove the asterisk to default to 0.
* OSLAF 1

The script always runs kascade and cherenk7. The kascade.pilot must call for stdout
for the output of kascade.

Run grisudet (1), default 0
* GRISU 1

Run analysis (1), default 0
* ANALZ 1

Run cutspec (1), default 0
* CUTSP 1

Copy kascade.pilot from GrISU directory (1), default 0
* COPYK 1
Copy cherenkov.pilot from GrISU directory (1), default 0
* COPYC 1
Copy detector.pilot from GrISU directory (1), default 0
* COPYP 1
Copy analysis.pilot from GrISU directory (1), default 0
* COPYA 1
Copy cutspec.pilot from GrISU directory (1), default 0
* COPYN 1

You must first run the "randomlist" code in GrISU/CommonTools to
create the list of random numbers in the file, randomlist.txt.file.
Change kascade.pilot seed (1), default 0
* SEEDK 1
Change cherenkov.pilot seeds (1), default 0
* SEEDC 1
Change detector.pilot seed (1), default 0
* SEEDG 1

gzip cph files (1), default 0 (do not compress)
* COMPC 1

Specify storage directory, default, don't move output files
You may specify a common storage directory to accumulate all output files from
all nodes. The script adds run and node identifiers to the filenames, e.g
run1.node0.photon.cph.gz, etc. In this example, the storage directory is on the
server or master node.

* STORE /home/duke/StoreRuns

Pilot File Comments (Answers to FAQ's)

A. Each machine has a directory GrISU/node, where "node" is the name of the machine. There is a separate node for each machine. When running on a Beowulf cluster, the name of the node is the number of the Beowulf node. For the ISU cluster, there is a number for each processor: "0" and "00" for slave machine 0, "1" and "10" for slave machine 2, etc. I only use the single digit processors since the OSLAF software will run kascade on one processor and cherenkf7 on the second processor.

B. Each GrISU/node directory will act as the home GrISU directory for that
machine. The run_sim perl script sets up the links to GrISU/Config,
GrISU/Simulation, etc.

C. In the pilot_runsim.txt file, you'll see options to:

select the number of runs where each run is a repeat with different random number seeds of the showers defined in kascade.pilot and cherenkov.pilot, etc. For example, you could set up 20 runs wherethe files from each run would contain 50,000 showers.

choose to execute grisudet, analysis, and/or cutspec). (cutspec performs preliminary cutting and sets up the energies.log and mcarlo.log files for the energy spectra analysis). Kascade and chernkov will always execute. Be sure to set up kascade and cherenkov so that the output of kascade will pipe into cherenkov.

copy the pilot files from GrISU to GrISU/node. You may have already
copied these pilot files and set their parameters appropriate for
this node. In this case, run_sim should not copy these files from
GrISU.

change the random number seeds in kascade.pilot, cherenkov.pilot, and
in detector.pilot prior to running kascade, cherenkf, and grisudet. The
perl script draws random numbers from the randomnumber.txt list in
GrISU/CommonTools.

move all files to a single storage directory.

gzip the photon file, e.g. photon.cph.

D. At the end of each run, the script tags each file with a node number and a run number. For example, after completing run 2 on node 5, the photon.cph cherenkf photon file will be named node5.run2.photon.cph (the node can have any alphanumeric name). All output files will accumulate in the directories specified in the pilot files. If a storage directory is given
in runsim.pilot, then at the conclusion of each run, all output files will be moved to this directory. This option permits the generation of files on a local disk before moving the output files to a disk elsewhere on the network. At Grinnell this is important as it avoids serious network traffic if all nodes were using a common storage directory during shower
generation. On a Beowulf cluster, this option may be similarly used to move the files to a common NSF mounted node, e.g. the master node, at the conclusion of each run.

E. The script contains an option for running on OSCAR, the ISU Beowulf
cluster. This cluster uses the OSCAR system and the run_sim script uses the
OSCAR "cexec :node" command. Thus, to use OSLAF and similar systems, the
master node must be NSF mounted on each slave node.

Each processor has a node number, e.g. slave machine 1 has processor numbers 1 and 01. Thus, from GrISU set up a node for each processor you plan to use, e.g. GrISU/1 and GrISU/01 for machine 1, and likewise for machines 0 through 9.

Set up the pilot files in each node directory, choosing the OSLAF option. Then, execute the run_sim script (../run_sim ) within each directory on the master node. The script executes the programs and manages the output files on each node - it determines the node from the node directory name. Notice that you move no files to the slave node disks since the master node is NSF mounted on each slave machine.

F. On OSLAF, I set up a screen (using the UNIX "screen" command) for each node directory. I then start run_sim from within that screen window. Ctrl-A Ctrl-D will place the window in background and you can proceed to the next window. When you log out, all screens are still active and you can later return to each screen (or window) with the "screen -r" command. You can check out "man screen" for more details. Other use the "vnc" software; I've found this is very slow for the connection between Grinnell and Ames. As yet, I have not been able to use "nohup ../run_sim" without problems that I don't understand.

Using the run_grisudet script

Should you wish to run a series of photon.cph files through grisudet and following codes, you'll need to use run_grisudet. This perl script used the pilot file, pilot_rungrisudet. The operation is very similar to that of run_sim, requiring that you use node directories, such as GrISU/node. The instructions are included in the example pilot file in Utilities/Runsim and shown below.

Example rungrisudet.pilot pilotfile
Pilot_rungrisudet.txt file to control perl script run_grisudet. This
script executes grisudet and following codes for a series of photon.cph
grisudet input files.

Copy this pilot file to a node directory, e.g. GrISU/node; copy run_grisudet to GrISU (run_grisudet is stored in GrISU/Utilities/Runsim. Execute the script, i.e.
../run_grisudet, from GrISU/node.

Output files are given the same node and run designation as the cph files
given in this pilot file.

data lines begin with an asterisk, otherwise ignored

Execute runs on Oslaf node (1), default 0
* OSLAF 0

list of photon.cph files to run through grisudet, etc. there must be at
least one file between *CPHFL and * ENDFL. The script determines whether or
not to gunzip the file and then gzips the file afterwards if it was gzipped
initially.
* CPHFL 2
/home/duke/Stephan/GrISU/node1.run2.photon.cph.gz
/home/duke/Stephan/GrISU/node2.run3.photon.cph.gz
* ENDFL

Run analysis (1), default 0
* ANALZ 1

Run cutspec (1), default 0
* CUTSP 1

Copy detector.pilot from GrISU directory (1), default 0
* COPYP 1

Copy analyze.pilot from GrISU directory (1), default 0
* COPYA 1

Copy cutspec.pilot from GrISU directory (1), default 0
* COPYN 1

Change detector.pilot seed (1), default 0
* SEEDG 0

Specify storage directory, default, don't move output files. The
photon.cph files are never moved from their original location (as given
under the CPHFL flag
* STORE Store