Auto-tuning Dense Vector and Matrix-vector Operations for Fermi GPUs

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    Abstract

    In this paper, we consider the automatic performance tuning of dense vector and matrix-vector operations on GPUs. Such operations form the backbone of level 1 and level 2 routines in the Basic Linear Algebra Subroutines (BLAS) library and are therefore of great importance in many scientific applications. As examples, we develop single-precision CUDA kernels for the Euclidian norm (SNRM2) and the matrix-vector multiplication (SGEMV). The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture). We show that auto-tuning can be successfully applied to achieve high performance for dense vector and matrix-vector operations by appropriately utilizing the fine-grained parallelism of the GPU. Our tuned kernels display between 25-100% better performance than the current CUBLAS 3.2 library.
    Original languageEnglish
    Title of host publicationParallel Processing and Applied Mathematics : 9th International Conference, PPAM 2011
    EditorsRoman Wyrzykowski, Jack Dongarra, Konrad Karczewski, Jerzy Wasniewski
    PublisherSpringer
    Publication date2012
    Pages619-629
    DOIs
    Publication statusPublished - 2012
    EventParallel Processing and Applied Mathematics. 9th International Conference, PPAM 2011 - Torun, Poland
    Duration: 11 Sept 201114 Sept 2011
    http://ppam.pl/

    Conference

    ConferenceParallel Processing and Applied Mathematics. 9th International Conference, PPAM 2011
    Country/TerritoryPoland
    CityTorun
    Period11/09/201114/09/2011
    Internet address
    SeriesLecture Notes in Computer Science
    Volume7203
    ISSN0302-9743

    Keywords

    • GPU
    • BLAS
    • Dense linear algebra
    • Parallel algorithms

    Fingerprint

    Dive into the research topics of 'Auto-tuning Dense Vector and Matrix-vector Operations for Fermi GPUs'. Together they form a unique fingerprint.

    Cite this