High-Performance Matrix-Vector Multiplication on the GPU
Publication: Research - peer-review › Article in proceedings – Annual report year: 2012
In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.
| Original language | English |
|---|---|
| Title | Euro-Par 2011 |
| Publisher | Springer |
| Publication date | 2012 |
| Pages | 377-386 |
| ISBN (print) | 978-3-642-29736-6 |
| ISBN (electronic) | 978-3-642-29737-3 |
| DOIs | |
| State | Published |
Conference
| Conference | Euro-Par 2011 |
|---|---|
| Country | France |
| City | Bordeaux |
| Period | 29-08-11 → 02-09-11 |
| Internet address | http://europar2011.bordeaux.inria.fr/ |
| Name | Lecture Notes in Computer Science |
|---|---|
| Number | 7155 |
| ISSN (Print) | 0302-9743 |
| Citations | Web of Science® Times Cited: No match on DOI |
|---|
Keywords
- GPU, Matrix-Vector Multiplication, Dense linear algebra
Loading map data...
ID: 9862150