High-Performance Matrix-Vector Multiplication on the GPU

    Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

    Abstract

    In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.
    Original languageEnglish
    Title of host publicationEuro-Par 2011
    PublisherSpringer
    Publication date2012
    Pages377-386
    ISBN (Print)978-3-642-29736-6
    ISBN (Electronic)978-3-642-29737-3
    DOIs
    Publication statusPublished - 2012
    EventEuro-Par 2011 - Bordeaux II University, Bordeaux, France
    Duration: 29 Aug 20112 Sep 2011
    http://europar2011.bordeaux.inria.fr/

    Conference

    ConferenceEuro-Par 2011
    LocationBordeaux II University
    Country/TerritoryFrance
    CityBordeaux
    Period29/08/201102/09/2011
    Internet address
    SeriesLecture Notes in Computer Science
    Number7155
    ISSN0302-9743

    Keywords

    • GPU
    • Matrix-Vector Multiplication
    • Dense linear algebra

    Fingerprint

    Dive into the research topics of 'High-Performance Matrix-Vector Multiplication on the GPU'. Together they form a unique fingerprint.

    Cite this