High-Performance Matrix-Vector Multiplication on the GPU

Publication: Research - peer-reviewArticle in proceedings – Annual report year: 2012

View graph of relations

In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.
Original languageEnglish
Title of host publicationEuro-Par 2011
PublisherSpringer
Publication date2012
Pages377-386
ISBN (print)978-3-642-29736-6
ISBN (electronic)978-3-642-29737-3
DOIs
StatePublished

Conference

ConferenceEuro-Par 2011
CountryFrance
CityBordeaux
Period29/08/1102/09/11
Internet addresshttp://europar2011.bordeaux.inria.fr/
NameLecture Notes in Computer Science
Number7155
ISSN (Print)0302-9743
CitationsWeb of Science® Times Cited: No match on DOI

Keywords

  • GPU, Matrix-Vector Multiplication, Dense linear algebra
Download as:
Download as PDF
Select render style:
APAAuthorCBEHarvardMLAStandardVancouverShortLong
PDF
Download as HTML
Select render style:
APAAuthorCBEHarvardMLAStandardVancouverShortLong
HTML
Download as Word
Select render style:
APAAuthorCBEHarvardMLAStandardVancouverShortLong
Word

ID: 9862150