7. Run A Program

Once you can do all the tests shown above, you should be able to run a program. From here on in, the instructions are lam specific.

Go back to the head node, log in as wolf, and enter the following commands:

cat > /nnt/wolf/lamhosts 
wolf01 
wolf02 
wolf03 
wolf04 
<control d>

Go to the lam examples directory, and compile "hello.c":

mpicc -o hello hello.c 
cp hello /mnt/wolf 

Then, as shown in the lam documentation, start up lam:

[wolf@wolf00 wolf]$ lamboot -v lamhosts 
LAM 7.0/MPI 2 C++/ROMIO - Indiana University 
n0<2572> ssi:boot:base:linear: booting n0 (wolf00) 
n0<2572> ssi:boot:base:linear: booting n1 (wolf01) 
n0<2572> ssi:boot:base:linear: booting n2 (wolf02) 
n0<2572> ssi:boot:base:linear: booting n3 (wolf04) 
n0<2572> ssi:boot:base:linear: finished

So we are now finally ready to run an app. [Remember, I am using lam; your message passing interface may have different syntax].

[wolf@wolf00 wolf]$ mpirun n0-3 /mnt/wolf/hello 
Hello, world! I am 0 of 4 
Hello, world! I am 3 of 4 
Hello, world! I am 2 of 4 
Hello, world! I am 1 of 4 
[wolf@wolf00 wolf]$

Recall I mentioned the use of NFS above. I am telling the nodes to all use the nfs shared directory, which will bottleneck when using a larger number of boxes. You could easily copy the executable to each box, and in the mpirun command, specify node local directories: mpirun n0-3 /home/wolf/hello. The prerequisite for this is to have all the files available locally. In fact I have done this, and it worked better than using the nfs shared executable. Of course this theory breaks down if my cluster application needs to modify a file shared across the cluster.