|
SSP Project Summary: MPI Datatypes Toolset
|
|
One of the major areas of difficulty in using MPI is in defining MPI
derived datatypes that match application data structures. Since MPI
is a library it has no knowledge of the layout of program data
determined by the compiler. Thus, layout information has to be
acquired explicitly and fed into the MPI datatype constructors. This
can be laborious and error-prone. These problems can be alleviated by
development of an MPI datatypes toolset. Two ways in which assistance
can be provided are:
- compiler-level creation of MPI derived datatypes matching
complex language types.
- library support for creation of MPI derived datatypes matching
common data sub-structures.
Even though reasonably clear examples are given in the MPI standard
and elsewhere, defining datatypes to match C structures is not
elegant. The formulaic nature of this procedure suggests development
of a pre-processing tool to generate functions/macros that create MPI
datatypes for marked structure definitions, perhaps similar to the
automatic generation of XDR (eXternal Data Repn) encoding/decoding
code via the rpcgen tool. An example of a marked C structure is
given below. This technique can, in principle, also be applied to
Fortran 77 COMMON blocks, and Fortran 90 derived datatypes. This
will require development of a parser that locates the marked sections
and parses the datatype definition. To create a matching MPI derived
datatype, it is necessary to obtain the name of the type, and the name
and C type of each of its fields. A sensible initial limitation would
be to disallow recursive types.
#pragma MPIDT_STRUCT begin
struct MsgObj
{ unsigned char dest;
char data[MSG_LEN];
int crc;
};
#pragma MPIDT_STRUCT end
int MPIDT_create_struct_MsgObj(MPI_Datatype *newtype);
It is possible to provide a portable library of calls that create
datatypes for regular structures, such as the representation of a
column in an array that can be composed as though a basic type. (It
is relatively easy to define an MPI derived datatype with a stride
covering the correct distance in an array, but to compose this
datatype the extent must be adjusted. This adjustment is
possible but requires a detailed understanding of MPI datatype
construction.) The library should use only standard MPI datatype
constructors and so will be portable to any MPI implementation on any
platform.
Direction in the library development may benefit from a survey of use
of MPI datatypes in various application areas. As an example, the
prototype of a datatype constructor analogous to the Process
Topologies sub-space partition, for data arrays, is given below.
MPIDT_ARRAY_SUB(eltype, ndims, dims, remain, newtype)
IN eltype basic datatype of element (handle)
IN ndims number of array dimensions (integer)
IN dims integer array of size ndims specifying
the array size in each dimension
IN remain logical array of size ndims specifying
the dimensions covered by newtype
OUT newtype new datatype (handle)
The suggested schedule for this project is as follows:
- Week 1 Course: C programming, MPI, Code management, LaTeX
- Week 2 Familiarisation with advanced MPI datatypes
- Week 3-4 Development of simple C struct MPI datatype
generation
- Week 5 Testing and feedback (use in SSP code?)
- Week 6-7 Development of F77 COMMON block MPI datatype
generation or Development of F90 derived type MPI datatype
generation or Extension of C struct MPI datatype
generation
- Week 8 Testing and feedback (use in SSP code?)
- Week 9 (Development/testing of C/F77 extended datatype library)
- Week 10 Documentation and Report
Expertise Required
The emphasis of the project will be on development of the
compiler-level tools, requiring development of a parser. This will
likely involve use of lex and yacc parser generation
tools, and so will require C programming skills. Knowledge of
compiler techniques would be useful, suggesting a student with a
Computer Science background.
The extended datatype constructor library could be implemented in
either C or Fortran. If this project were to involve a survey of
applications use of MPI datatypes, some knowledge of computational
science would be useful. This part of the project could be made more
suitable for a student from Physical Sciences; it is possible that
there would be a shortfall in project effort in restricting this
project to the library development part.
Resources Required
The tools will be developed on workstations and can be tested on any
MPI platform (EPCC or X-lab workstations, Meikos, T3D). There is no
requirement for visualisation capability. It is likely that the
compiler-level project will require UNIX tools lex (or
flex) and yacc (or bison), which are available on
all EPCC workstations at least.
Resources Supplied
The compiler-level tool may benefit from publically available grammars
for C, Fortran 77, Fortran 90 data structure definition; there are
known sources for at least C and Fortran 77. The
language syntax definitions will be required. Particularly for a
survey of MPI datatypes use, but useful for testing of any parts of
the project, access to a set of existing MPI applications (perhaps
resulting from other SSP projects).
References
- MPI: A Message-Passing Interface Standard. 12 June 1995, v1.1.
- comp.compilers FAQ (Frequently Asked Questions)
- lex & yacc. Levine/Mason/Brown. O'Reilly & Associates Inc.
- man 1 rpcgen, XDR and RPC Language definitions.
Werner Augustin worked on this project.
Compressed PostScript of the project's final report is
available here
(53 kbytes) .