Abstract
GPUs have already become an integral part of high performance scientific computing, since they offer
dedicated parallel hardware that can potentially accelerate the execution of many scientific applications.
In this talk, I will consider the automatic performance acceleration of dense vector and matrix-vector operations
on GPUs. Such operations form the backbone of level 1 and level 2 routines in the Basic Linear Algebra
Subroutines (BLAS) library and are therefore of great importance in many scientific applications. The target
hardware is the most recent NVIDIA Tesla 20-series (Fermi architecture). Most of the techniques I discuss
for accelerating dense linear algebra are applicable to memory-bound GPU algorithms in general.
Original language | English |
---|---|
Publication date | 2011 |
Publication status | Published - 2011 |
Event | Accelerating Computations : Workshop - Aarhus, Denmark Duration: 1 Jan 2011 → … |
Conference
Conference | Accelerating Computations : Workshop |
---|---|
City | Aarhus, Denmark |
Period | 01/01/2011 → … |