Abstract
In this work, we deal with the QR factorization of block-tridiagonal matrices, where the blocks are dense and rectangular. This work is motivated by a novel method for computing geodesics over Riemannian manifolds. If blocks are reduced sequentially along the diagonal, only limited parallelism is available. We propose a matrix permutation approach based on the nested dissection method which improves parallelism at the cost of additional computations and storage. We show how operations can be arranged to keep this extra cost as low as possible. We provide a detailed analysis of the approach showing that this extra cost is bounded. Finally, we present an implementation for shared memory systems which relies on task parallelism and makes use of a runtime system. Experimental results support the conclusions of our analysis and show that the proposed approach leads to good performance and scalability.
Original language | English |
---|---|
Journal | SIAM Journal on Scientific Computing |
Volume | 42 |
Issue number | 6 |
Pages (from-to) | C313-C334 |
ISSN | 1064-8275 |
DOIs | |
Publication status | Published - 2020 |
Keywords
- QR factorization
- Nested dissection
- Task-based parallelism