Node-Level Performance Modeling of Sparse Factorization Solver
Decanato - Facoltà di scienze informatiche
Data: 16 Luglio 2021 / 10:30 - 13:00
You are cordially invited to attend the PhD Dissertation Defence of Radim Janalík on Friday 16 July 2021 at 10:30 on Teams.
Solving large sparse linear systems is at the heart of many application problems arising from scientific and engineering problems. These systems are often solved by direct factorization solvers, especially when the system needs to be solved for multiple right-hand sides or when a high numerical precision is required. Direct solvers are based on matrix factorization, which is then followed by forward and backward substitution to obtain a precise solution. The factorization is the most computationally intensive step, but it has to be computed only once for a given matrix. Then the system is solved with forward and backward substitution for every right-hand side. Performance modeling of algorithms involved in solving these linear systems reveals the computational bottlenecks, which can guide node-level performance optimizations and shows the best performance that can be achieved on given architecture. In this thesis we investigate and analyze the performance of the forward/backward solution process of the PARDISO direct sparse solver and present a detailed performance analysis for its sparse solver kernel. This analysis is based on the Berkeley roofline model, a model that is widely used to predict the upper bound of a code based on processor peak performance and memory bandwidth. We establish a modified roofline model that captures the serial and parallel execution phases which allows us to predict the in-socket scaling over the processor cores. The distinction of serial and parallel execution is important as the amount of the serial fraction depends on the matrix used and can have a significant negative impact on performance. We compared the roofline model with an alternative Erlangen ECM model and provide discussion on usability and modeling capabilities of both models. The model predictions are compared with various measurements for a representative set of sparse matrices on different x86_64 processors. The performance analysis and modeling performed in this work are limited to a single node, however, the code considered here is also a building block for the MPI parallel version. Hence, also the distributed memory implementation of PARDISO will profit from any enhancement achieved.
- Prof. Olaf Schenk, Università della Svizzera italiana, Switzerland (Research Advisor)
- Prof. Illia Horenko, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Igor Pivkin, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Harald Köstler, Friedrich–Alexander University Erlangen-Nurnberg, Germany (External Member)
- Prof. Tomas Kozubek, Technical University of Ostrava, Czech Republic (External Member)