New Level-3 BLAS Kernels for Cholesky Factorization

Fred G. Gustavson, Jerzy Wasniewski, José R. Herrero

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    Abstract

    Some Linear Algebra Libraries use Level-2 routines during the factorization part of any Level-3 block factorization algorithm. We discuss four Level-3 routines called DPOTF3, a new type of BLAS, for the factorization part of a block Cholesky factorization algorithm for use by LAPACK routine DPOTRF or for BPF (Blocked Packed Format) Cholesky factorization. The four routines DPOTF3 are Fortran routines. Our main result is that performance of routines DPOTF3 is still increasing when the performance of Level-2 routine DPOTF2 of LAPACK starts to decrease. This means that the performance of DGEMM, DSYRK, and DTRSM will increase due to their use of larger block sizes and also to making less passes over the matrix elements. We present corroborating performance results for DPOTF3 versus DPOTF2 on a variety of common platforms. The four DPOTF3 routines are based on simple register blocking; different platforms have different numbers of registers and so our four routines have different register blockings. Blocked Packed Format (BPF) is discussed. LAPACK routines for -POTRF and -PPTRF using BPF instead of full and packed format are shown to be trivial modifications of LAPACK -POTRF source codes. Upper BPF is shown to be identical to square block packed format. Performance results for DBPTRF and DPOTRF for large n show that routines DPOTF3 does increase performance for large n.
    Original languageEnglish
    Title of host publicationParallel Processing and Applied Mathematics : 9th International Conference, PPAM 2011, Torun, Poland, September 11-14, 2011. Revised Selected Papers, Part I
    PublisherSpringer
    Publication date2012
    Pages60-69
    ISBN (Print)978-3-642-31463-6
    ISBN (Electronic)978-3-642-31464-3
    DOIs
    Publication statusPublished - 2012
    EventParallel Processing and Applied Mathematics. 9th International Conference, PPAM 2011 - Torun, Poland
    Duration: 11 Sep 201114 Sep 2011
    http://ppam.pl/

    Conference

    ConferenceParallel Processing and Applied Mathematics. 9th International Conference, PPAM 2011
    CountryPoland
    CityTorun
    Period11/09/201114/09/2011
    Internet address
    SeriesLecture Notes in Computer Science
    Volume7203
    ISSN0302-9743

    Fingerprint Dive into the research topics of 'New Level-3 BLAS Kernels for Cholesky Factorization'. Together they form a unique fingerprint.

    Cite this