GPUs have already become an integral part of high performance scientific computing, since they offer dedicated parallel hardware that can potentially accelerate the execution of many scientific applications. In this talk, I will consider the automatic performance acceleration of dense vector and matrix-vector operations on GPUs. Such operations form the backbone of level 1 and level 2 routines in the Basic Linear Algebra Subroutines (BLAS) library and are therefore of great importance in many scientific applications. The target hardware is the most recent NVIDIA Tesla 20-series (Fermi architecture). Most of the techniques I discuss for accelerating dense linear algebra are applicable to memory-bound GPU algorithms in general.
|Publication status||Published - 2011|
|Event||Accelerating Computations : Workshop - Aarhus, Denmark|
Duration: 1 Jan 2011 → …
|Conference||Accelerating Computations : Workshop|
|Period||01/01/2011 → …|