High performance linear algebra algorithms: An introduction

F.G. Gustavson, Jerzy Wasniewski

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearch


    his Mini-Symposium consisted of two back to back sessions, each consisting of five presentations, held on the afternoon of Monday, June 21, 2004. A major theme of both sessions was novel data structures for the matrices of dense linear algebra, DLA. Talks one to four of session one all centered on new data layouts of matrices. Cholesky factorization was considered in the first three talks and a contrast was made between a square block hybrid format, a recursive packed format and the two standard data structures of DLA, full and packed format. In both talks one and two, the first two new formats led to level three high performance implementations of Cholesky factorization while using exactly the same amount of storage that standard packed format required. Of course, full format requires twice the amount of storage of the other three formats. In talk one, John Reid presented a definitive study of Cholesky factorization using a standard block based iterative Cholesky factorization, [1]. This factorization is typical of Lapack type factorizations; the major difference of [1] is the type of data structure it uses: talk one uses square blocks of order NB to represent a lower (upper) triangular matrix. In talk two, Jerzy Waśniewski presented the recursive packed format and its related Cholesky factorization algorithm, [2]. This novel format gave especially good Cholesky performance for very large matrices. In talk three, Jerzy Waśniewski demonstrated a detailed tuning strategy for talk one and presented performance results on six important platforms, Alpha, IBM, Intel, Itanium, SGI and Sun. The performance runs covered the algorithms of talks one and two as well as Lapack’s full and packed Cholesky codes, [3]. Overall, the square block hybrid method was best but was not a clear winner. The recursive method suffered because it did not combine blocking with recursion, [4]. Talk four, presented by Fred Gustavson, had a different flavor. Another novel data format was described which showed that two standard full format arrays could represent a triangular or symmetric matrix using only a total storage that was equal to the storage of standard packed storage, [5]. Therefore, this format has the same desirable property of standard full format arrays: one can use standard level 3 BLAS, [6] as well as some 125 or so full format Lapack symmetric / triangular routines on it. Thus new codes written for the new format are trivial to produce as they mostly consist of just calls to already existing codes. The last talk of session one, by James Sexton was on the massively
    Original languageEnglish
    Title of host publicationApplied Parallel Computing, state of the art in scientific computing : 8th international workshop, PARA 2006, Umeaa, Sweden, June 18-21, 2006 ; revised selected papers
    Number of pages1192
    Place of Publication225-227
    Publication date2006
    ISBN (Print)3-54075-75-46
    Publication statusPublished - 2006
    Event8th International Workshop on Applied Parallel Computing: State of the Art in Scientific Computing - Umeå, Sweden
    Duration: 18 Jun 200621 Jun 2006
    Conference number: 8


    Workshop8th International Workshop on Applied Parallel Computing
    SeriesLecture Notes in Computer Science
    Numbervol. 3732

    Fingerprint Dive into the research topics of 'High performance linear algebra algorithms: An introduction'. Together they form a unique fingerprint.

    Cite this