MPI_REDUCE, MPI_Reduce

Purpose

Applies a reduction operation to the vector sendbuf over the set of tasks
specified by comm and places the result in recvbuf on root.

C synopsis

#include <mpi.h>
int MPI_Reduce(void* sendbuf,void* recvbuf,int count,
    MPI_Datatype datatype,MPI_Op op,int root,MPI_Comm comm);

C++ synopsis

#include mpi.h
void MPI::Comm::Reduce(const void* sendbuf, void* recvbuf, int count,
                       const MPI::Datatype& datatype, const MPI::Op& op,
                       int root) const;

FORTRAN synopsis

include 'mpif.h' or use mpi
MPI_REDUCE(CHOICE SENDBUF,CHOICE RECVBUF,INTEGER COUNT,
    INTEGER DATATYPE,INTEGER OP,INTEGER ROOT,INTEGER COMM,
    INTEGER IERROR)

Description

This subroutine applies a reduction operation to the vector sendbuf over
the set of tasks specified by comm and places the result in recvbuf on
root.

The input buffer and the output buffer have the same number of elements
with the same type. The arguments sendbuf, count, and datatype define the
send or input buffer. The arguments recvbuf, count and datatype define the
output buffer. MPI_REDUCE is called by all group members using the same
arguments for count, datatype, op, and root. If a sequence of elements is
provided to a task, the reduction operation is executed element-wise on
each entry of the sequence. Here's an example. If the operation is MPI_MAX
and the send buffer contains two elements that are floating point numbers
(count = 2 and datatype = MPI_FLOAT), recvbuf(1) = global max(sendbuf(1))
and recvbuf(2) = global max(sendbuf(2)).

Users can define their own operations or use the predefined operations
provided by MPI. User-defined operations can be overloaded to operate on
several datatypes, either basic or derived. The argument datatype of
MPI_REDUCE must be compatible with op. See IBM Parallel Environment for
AIX: MPI Programming Guide for a list of the MPI predefined operations.

The "in place" option for intracommunicators is specified by passing the
value MPI_IN_PLACE to the argument sendbuf at the root. In this case, the
input data is taken at the root from the receive buffer, where it will be
replaced by the output data.

If comm is an intercommunicator, the call involves all tasks in the
intercommunicator, but with one group (group A) defining the root task. All
tasks in the other group (group B) pass the same value in argument root,
which is the rank of the root in group A. The root passes the value
MPI_ROOT in root. All other tasks in group A pass the value MPI_PROC_NULL
in root. Only send buffer arguments are significant in group B and only
receive buffer arguments are significant at the root.

MPI_IN_PLACE is not supported for intercommunicators.

When you use this subroutine in a threads application, make sure all
collective operations on a particular communicator occur in the same order
at each task. See IBM Parallel Environment for AIX: MPI Programming Guide
for more information on programming with MPI in a threads environment.

Parameters

sendbuf
     is the address of the send buffer (choice) (IN)
recvbuf
     is the address of the receive buffer (choice, significant only at
     root) (OUT)
count
     is the number of elements in the send buffer (integer) (IN)
datatype
     is the datatype of elements of the send buffer (handle) (IN)
op
     is the reduction operation (handle) (IN)
root
     is the rank of the root task (integer) (IN)
comm
     is the communicator (handle) (IN)
IERROR
     is the FORTRAN return code. It is always the last argument.

Notes

See IBM Parallel Environment for AIX: MPI Programming Guide.

The MPI standard urges MPI implementations to use the same evaluation order
for reductions every time, even if this negatively affects performance. PE
MPI adjusts its reduce algorithms for the optimal performance on a given
task distribution. The MPI standard suggests, but does not mandate, this
sacrifice of performance. PE MPI chooses to put performance ahead of the
MPI standard's recommendation. This means that two runs with the same task
count may produce results that differ in the least significant bits, due to
rounding effects when evaluation order changes. Two runs that use the same
task count and the same distribution across nodes will always give
identical results.

In the 64-bit library, this function uses a shared memory optimization
among the tasks on a node. This optimization is discussed in the chapter
"Using shared memory" of IBM Parallel Environment for AIX: MPI Programming
Guide, and is enabled by default. This optimization is not available to
32-bit programs.

Errors

Fatal errors:

Invalid count
     count < 0

Invalid datatype

Type not committed

Invalid op

Invalid root

     For an intracommunicator: root < 0 or root >= groupsize

     For an intercommunicator: root < 0 and is neither MPI_ROOT nor
     MPI_PROC_NULL, or root >= groupsize of the remote group

Invalid communicator

Unequal message lengths

Invalid use of MPI_IN_PLACE

MPI not initialized

MPI already finalized

Develop mode error if:

Inconsistent op

Inconsistent datatype

Inconsistent root

Inconsistent message length

Related information

     MPE_IREDUCE
     MPI_ALLREDUCE
     MPI_OP_CREATE
     MPI_REDUCE_SCATTER
     MPI_SCAN