SSP 1994 project summary:
[EPCC home] [SSP home] [2001 projects] [2000 projects] [1999 projects] [1998 projects] [1997 projects] [1996 projects] [1995 projects] [1994 projects] [1993 projects]

Recognising and Avoiding Thrashing in Dynamic Load Balancing

Irregular mesh computations are often parallelised by breaking the mesh up into pieces, and giving one piece to each process. This should be done so that each process has a roughly equivalent amount of work to do. However, as a computation proceeds, the workload over the mesh may change, leading to load imbalance problems.

Dynamic load balancing uses information about the workload on each process in an application and adjusts the borders of their local meshes so as to achieve a more even distribution. Thrashing is the phenomenom by which repeated iterations of the load balancing move elements in an unhelpful way, for example by continuously swapping the same set of border elements back and forth between two neighbouring processes.

This project investigates different kinds of thrashing in dynamic load balancing systems, using the PUL-SM library for irregular mesh applications, and the dynamic load balancing functions it provides. The aim is to develop a quantitative measure of the level of thrashing within an application. The accuracy of this measure can be broadly assessed by comparison with qualitative techniques. These will involve direct observation of the load balancing behaviour of an application, potentially through the use of video. A quantitative measure of the thrashing in a system will be more generally useful, and suitable for 3D meshes.

If possible, techniques for avoiding thrashing will be investigated. These could be preventative, involving careful use of the powerful control features provided by the PUL-SM load balancing functions, or curative, based on detecting and avoiding thrashing as an application executes.


Frank Stangenberg worked on this project.

Compressed PostScript of Frank's final report is available here (1160768 bytes) .

Webpage maintained by mario@epcc.ed.ac.uk