High-Efficiency Specialized Support for Dense Linear Algebra Arithmetic in LuNA System

Automatic synthesis of efficient scientific parallel programs for supercomputers is in general a complex problem of system parallel programming. Therefore various specialized synthesis algorithms and heuristics are of use. LuNA system for automatic construction of distributed parallel programs provides a basis for accumulation of such algorithms to provide high-quality parallel programs generation in particular subject domains. If no specialized support is available in LuNA for given input, then the general synthesis algorithm is used, which does construct the required program, but its efficiency may be unsatisfactory. In the paper a specialized run-time system for LuNA is presented, which provides runtime support for dense linear algebra operations implementation on distributed memory multicomputers. Experimental results demonstrate, that automatically generated parallel programs of the class outperform corresponding ScaLAPACK library subroutines, which makes LuNA system practically applicable for generating high performance distributed parallel programs for supercomputers in the dense linear algebra application class.

