Performance Engineering for HPC: Models generating insights

Staff - Faculty of Informatics

Start date: 29 March 2017

End date: 30 March 2017

Speaker:

Gerhard Wellein

 

University of Erlangen-Nuremberg, Germany

Date:

Wednesday, March 29, 2017

Place:

USI Lugano Campus, room SI-006, informatics building (Via G. Buffi 13)

Time:

13:30

 

 

Abstract:

We consider Performance Engineering (PE) as a structured, iterative process for code optimization and parallelization. The key ingredient is a white-box performance model which provides insights into the interaction between the code and the hardware. The model identifies the actual performance-limiting factors ("bottlenecks"), allowing for a selection of appropriate code changes. Once the impact of the code changes is validated the process restarts with a new bottleneck identified by the performance model. Since this model- based approach provides a thorough understanding of the impact of hardware features on code performance it is also useful in various other areas such as performance reproducibility, performance prediction for future architectures, or education and training. The talk will first introduce our PE concept and survey basic "white-box" performance models. Focus-ing on work performed in the "Equipping Sparse Solvers for Exasca le" (ESSEX) project we will demonstrate various aspects of PE in the context of sparse eigenvalue solvers for quantum physics applications. Here a thorough understanding of modern hardware concepts led to the proposal of a new sparse matrix data format, which delivers high performance for many matrix structures on all modern HPC compute devices (multicore CPUs, Intel Xeon Phi, Nvidia GPGPUs). The benefit of using (simple) analytic models in performance optimization is demonstrated for a Kernel Polynomial Method (KPM) based solver, which computes the spectral density of large sparse matrices. By designing specific kernel operations and applying blocking on interleaved vectors, this sparse KPM solver has been accelerated by 3-4 on the node level, delivering about 10% of peak performance on various generations of modern Intel CPUs and Nvidia GPGPUs. These improvements finally enabled us to achieve sustained performance in the PetaFlop/s range for large scale (heterogeneous) K PM calculations on sparse matrices from quantum physics applications.

Acknowledgment
This work was supported by the German Research Foundation (DFG) through the Priority Programs 1648 "Software for Exascale Computing" under project ESSEX (see https://blogs.fau.de/essex/).

 

 

Biography:

Gerhard Wellein holds a PhD in Physics from the University of Bayreuth and is a regular Professor at the Department for Computer Science at University of Erlangen-Nuremberg. He heads the HPC group at Erlangen Regional Computing Center (RRZE) and has more than ten years of experience in teaching HPC techniques to students and scientists from Computational Science and Engineering. His research interests include solving large sparse eigenvalue problems, novel prarallelization approaches, performance engineering, and architecture-specific optimization.

 

 

Host:

Prof. Olaf Schenk