Message Passing Interface (MPI) Exercise

Knowing Message Passing Interface (MPI) Exercise
Login to the workshop machine
Workshops differ in how this is done. The instructor will go over this beforehand
Copy the example files
In your SP home directory, create a subdirectory for the MPI test codes and cd to it.
mkdir ~/mpi
cd  ~/mpi
Copy either the Fortran or the C version of the parallel MPI exercise files to your mpi subdirectory:
C:
cp  /usr/global/docs/training/blaise/mpi/C/*   ~/mpi
Fortran:
cp  /usr/global/docs/training/blaise/mpi/Fortran/*   ~/mpi
Some of the example codes have serial versions for comparison. If you are interested in comparing/running the serial versions of the exercise codes, use the appropriate command below to copy those files to your mpi subdirectory also.
C:
cp  /usr/global/docs/training/blaise/mpi/Serial/C/*   ~/mpi
Fortran:
cp  /usr/global/docs/training/blaise/mpi/Serial/Fortran/*   ~/mpi 
List the contents of your MPI subdirectory
You should notice quite a few files. The parallel MPI versions have names which begin with or include mpi_. The serial versions have names which begin with or include ser_. Makefiles are also included.
Note: These are example files, and as such, are intended to demonstrate the basics of how to parallelize a code. Most execute in a second or two. The serial codes will be faster because the problem sizes are so small and there is none of the overhead associated with parallel setup and execution.

C Files Fortran Files Description

mpi_array.c ser_array.c mpi_array.f ser_array.f Array Decomposition

mpi_mm.c ser_mm.c mpi_mm.f ser_mm.f Matrix Multiply

mpi_pi_send.c dboard.c ser_pi_calc.c mpi_pi_send.f dboard.f ser_pi_calc.f pi Calculation - point-to-point communications

mpi_pi_reduce.c dboard.c ser_pi_calc.c mpi_pi_reduce.f dboard.f ser_pi_calc.f pi Calculation - collective communications

mpi_wave.c draw_wave.c ser_wave.c mpi_wave.f mpi_wave.h draw_wave.c ser_wave.f Concurrent Wave Equation

mpi_heat2D.c draw_heat.c ser_heat2D.c mpi_heat2D.f mpi_heat2D.h draw_heat.c ser_heat2D.f 2D Heat Equation

mpi_latency.c mpi_latency.f Round Trip Latency Timing Test

mpi_bandwidth.c mpi_bandwidth.f Bandwidth Timing Test

mpi_prime.c ser_prime.c mpi_prime.f ser_prime.f Prime Number Generation

mpi_2dfft.c mpi_2dfft.h ser_2dfft.c mpi_2dfft.f timing_fgettod.c ser_2dfft.f 2D FFT

mpi_ping.c mpi_ringtopo.c mpi_scatter.c mpi_contig.c mpi_vector.c mpi_indexed.c mpi_struct.c mpi_group.c mpi_cartesian.c mpi_ping.f mpi_ringtopo.f mpi_scatter.f mpi_contig.f mpi_vector.f mpi_indexed.f mpi_struct.f mpi_group.f mpi_cartesian.f From the tutorial...
Blocking send-receive
Non-blocking send-receive
Collective communications
Continguous derived datatype
Vector derived datatype
Indexed derived datatype
Structure derived datatype
Groups/Communicators
Cartesian Virtual Topology

Makefile.MPI.c Makefile.Ser.c Makefile.MPI.f Makefile.Ser.f Makefiles

mpi_bug1.c mpi_bug2.c mpi_bug3.c mpi_bug4.c mpi_bug5.c mpi_bug6.c mpi_bug1.f mpi_bug2.f mpi_bug3.f mpi_bug4.f mpi_bug5.f mpi_bug6.f Programs with bugs

Review the array decomposition example code
Depending upon your preference, take a look at either mpi_array.c or mpi_array.f. The comments explain how MPI is used to implement a parallel data decomposition on an array. You may also wish to compare this parallel version with its corresponding serial version, either ser_array.c or ser_array.f.
Compile the array decomposition example code
Invoke the appropriate IBM compiler command:
C:
mpcc -q64 -O2 mpi_array.c  -o mpi_array
Fortran:
mpxlf -q64 -O2 mpi_array.f -o mpi_array 
Setup your execution environment
In this step you'll set a few POE environment variables. Specifically, those which answer the three questions:

How many tasks/nodes do I need;
How will nodes be allocated?
How will communications be conducted (protocol and network)?

Set the following environment variables as shown:
Environment Variable Description
setenv MP_PROCS 4
Request 4 MPI tasks
setenv MP_NODES 1
Specify the number of nodes to use
setenv MP_RMPOOL 0
Selects the interactive node pool
Run the executable
Now that your execution environment has been setup, run the array decomposition executable:
mpi_array
Try any/all of the other example codes
The included Makefiles can be used to compile any or all of the exercise codes. For example, to compile all of the parallel MPI codes:
C:
make -f Makefile.MPI.c
Fortran:
make -f Makefile.MPI.f 
You can also compile selected example codes individually - see the Makefile for details. For example, to compile just the matrix multiply example code:
C:
make -f Makefile.MPI.c  mpi_mm
Fortran:
make -f Makefile.MPI.f  mpi_mm 
In either case, be sure to examine the makefile to understand the actual compile command used.
Most of the executables require only 4 MPI tasks or less. Exceptions are noted below.

mpi_array Requires that MP_PROCS be evenly divisible by 4
mpi_group mpi_cartesian mpi_group requires 8 MPI tasks
mpi_cartesian requires 16 MPI tasks
You can accomplish this with a combination of MP_PROCS, MP_NODES and MP_TASKS_PER_NODE environment variables.
mpi_wave mpi_heat2D These examples attempt to generate an X windows display. You will need to make sure that your Xwindows environment and software is setup correctly. Ask the instructor if you have any questions.
mpi_latency mpi_bandwidth The mpi_latency example requires only 2 MPI tasks, and the mpi_bandwidth example requires an even number of tasks. Setting MP_EUILIB to us then ip will demonstrate the difference in performance between User Space and Internet communications protocols. Try comparing communications bandwidth when both tasks are on the same node versus different nodes too.
When things go wrong...
There are many things that can go wrong when developing MPI programs. The mpi_bugX.X series of programs demonstrate just a few. See if you can figure out what the problem is with each case and then fix it.
Use mpcc -q64 or mpxlf -q64 to compile each code as appropriate.
The buggy behavior will differ for each example. Some hints are provided below.

Code Behavior Hints/Notes

mpi_bug1 Hangs

mpi_bug2 Seg fault/coredump/abnormal termination

mpi_bug3 Error message

mpi_bug4 Hangs and gives wrong answer. Compare to mpi_array Number of MPI tasks must be divisible by 4.

mpi_bug5 Dies or hangs - depends upon AIX and PE versions

mpi_bug6 Terminates (under AIX) Requires 4 MPI tasks.
This completes the exercise.

Please complete the online evaluation form if you have not already done so for this tutorial.

Where would you like to go now?

Agenda
Back to the tutorial

C:	*cp /usr/global/docs/training/blaise/mpi/C/ ~/mpi**
Fortran:	*cp /usr/global/docs/training/blaise/mpi/Fortran/ ~/mpi**

C Files	Fortran Files	Description
`mpi_array.c ser_array.c`	`mpi_array.f ser_array.f`	Array Decomposition
`mpi_mm.c ser_mm.c`	`mpi_mm.f ser_mm.f`	Matrix Multiply
`mpi_pi_send.c dboard.c ser_pi_calc.c`	`mpi_pi_send.f dboard.f ser_pi_calc.f`	pi Calculation - point-to-point communications
`mpi_pi_reduce.c dboard.c ser_pi_calc.c`	`mpi_pi_reduce.f dboard.f ser_pi_calc.f`	pi Calculation - collective communications
`mpi_wave.c draw_wave.c ser_wave.c`	`mpi_wave.f mpi_wave.h draw_wave.c ser_wave.f`	Concurrent Wave Equation
`mpi_heat2D.c draw_heat.c ser_heat2D.c`	`mpi_heat2D.f mpi_heat2D.h draw_heat.c ser_heat2D.f`	2D Heat Equation
`mpi_latency.c`	`mpi_latency.f`	Round Trip Latency Timing Test
`mpi_bandwidth.c`	`mpi_bandwidth.f`	Bandwidth Timing Test
`mpi_prime.c ser_prime.c`	`mpi_prime.f ser_prime.f`	Prime Number Generation
`mpi_2dfft.c mpi_2dfft.h ser_2dfft.c`	`mpi_2dfft.f timing_fgettod.c ser_2dfft.f`	2D FFT
`mpi_ping.c mpi_ringtopo.c mpi_scatter.c mpi_contig.c mpi_vector.c mpi_indexed.c mpi_struct.c mpi_group.c mpi_cartesian.c`	`mpi_ping.f mpi_ringtopo.f mpi_scatter.f mpi_contig.f mpi_vector.f mpi_indexed.f mpi_struct.f mpi_group.f mpi_cartesian.f`	From the tutorial... Blocking send-receive Non-blocking send-receive Collective communications Continguous derived datatype Vector derived datatype Indexed derived datatype Structure derived datatype Groups/Communicators Cartesian Virtual Topology
`Makefile.MPI.c Makefile.Ser.c`	`Makefile.MPI.f Makefile.Ser.f`	Makefiles
`mpi_bug1.c mpi_bug2.c mpi_bug3.c mpi_bug4.c mpi_bug5.c mpi_bug6.c`	`mpi_bug1.f mpi_bug2.f mpi_bug3.f mpi_bug4.f mpi_bug5.f mpi_bug6.f`	Programs with bugs

C:	mpcc -q64 -O2 mpi_array.c -o mpi_array
Fortran:	mpxlf -q64 -O2 mpi_array.f -o mpi_array

Environment Variable	Description
setenv MP_PROCS 4	Request 4 MPI tasks
setenv MP_NODES 1	Specify the number of nodes to use
setenv MP_RMPOOL 0	Selects the interactive node pool

C:	make -f Makefile.MPI.c
Fortran:	make -f Makefile.MPI.f

C:	make -f Makefile.MPI.c mpi_mm
Fortran:	make -f Makefile.MPI.f mpi_mm

`mpi_array`	Requires that MP_PROCS be evenly divisible by 4
`mpi_group mpi_cartesian`	`mpi_group` requires 8 MPI tasks `mpi_cartesian` requires 16 MPI tasks You can accomplish this with a combination of MP_PROCS, MP_NODES and MP_TASKS_PER_NODE environment variables.
`mpi_wave mpi_heat2D`	These examples attempt to generate an X windows display. You will need to make sure that your Xwindows environment and software is setup correctly. Ask the instructor if you have any questions.
`mpi_latency mpi_bandwidth`	The mpi_latency example requires only 2 MPI tasks, and the mpi_bandwidth example requires an even number of tasks. Setting MP_EUILIB to us then ip will demonstrate the difference in performance between User Space and Internet communications protocols. Try comparing communications bandwidth when both tasks are on the same node versus different nodes too.

Code	Behavior	Hints/Notes
`mpi_bug1`	Hangs
`mpi_bug2`	Seg fault/coredump/abnormal termination
`mpi_bug3`	Error message
`mpi_bug4`	Hangs and gives wrong answer. Compare to mpi_array	Number of MPI tasks must be divisible by 4.
`mpi_bug5`	Dies or hangs - depends upon AIX and PE versions
`mpi_bug6`	Terminates (under AIX)	Requires 4 MPI tasks.