### Abstract

This report is intended as a quick introduction to the OpenCL framework and the aim is to facilitate a smooth transfer into the use OpenCL C for developers with previous GPGPU experience. The purpose of OpenCL is to allow for developers to use all compute resources available on a heterogeneous hardware platform.

As well as being an introduction to OpenCL, the report also presents an overview of AMD GPU hardware, covering both the VLIW5/4 architectures and the upcoming Graphics-Core-Next architecture which is to form the basis of AMDs future generation GPUs that are to be as capable at compute as they are at graphics.

To conclude the presentation of OpenCL as a language for compute, a matrix-matrix multiplication example is devised and optimized for the VLIW4, Tesla and Fermi architectures. The performance is measured as a function of both matrix and work-group size and results are discussed. Where applicable, the equivalent CUDA implementation is tested for comparison.

As well as being an introduction to OpenCL, the report also presents an overview of AMD GPU hardware, covering both the VLIW5/4 architectures and the upcoming Graphics-Core-Next architecture which is to form the basis of AMDs future generation GPUs that are to be as capable at compute as they are at graphics.

To conclude the presentation of OpenCL as a language for compute, a matrix-matrix multiplication example is devised and optimized for the VLIW4, Tesla and Fermi architectures. The performance is measured as a function of both matrix and work-group size and results are discussed. Where applicable, the equivalent CUDA implementation is tested for comparison.

Original language | English |
---|

Publisher | Technical University of Denmark |
---|---|

Number of pages | 49 |

Publication status | Published - 2012 |

Series | D T U Compute. Technical Report |
---|---|

Number | 2012-05 |

ISSN | 1601-2321 |

## Cite this

Nielsen, A. S., Engsig-Karup, A. P., & Dammann, B. (2012).

*Parallel Programming using OpenCL on Modern Architectures*. Technical University of Denmark. D T U Compute. Technical Report, No. 2012-05