• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. GPU GEMM-Kernel Autotuning for scalable machine learners
 
  • Details
  • Full
Options
2019
Conference Paper
Title

GPU GEMM-Kernel Autotuning for scalable machine learners

Abstract
Deep learning (DL) is one of the key technologies in the artificial intelligence (AI) domain Deep learning neural networks (DLNN) profit a lot from the overall exponential data growth while on the other hand the computational effort for training and inference strongly increase. Most of the computational time in DLNN is consumed by the convolution step, which is based on a general matrix multiplication (GEMM). In order to accelerate the computational time for DLNN different highly optimized GEMM implementations for Graphic Processing Units (GPUs) have been presented in the last years [1] most of these approaches are GPU hardware specific implementations of the GEMM software kernel and do not incorporate the performance dependency of the training data layout. In order to achieve a maximum performance the parameters of the GEMM algorithm have to be tuned for the different GPU hardware and specific data layout of the training task. In this paper we present a two step autotuning approach for GPU based GEMM algorithms. In the first step the kernel parameter search space is pruned by several performance criteria and afterwards further processed by a modified Simulated Annealing in order to find the best kernel parameter combinations with respect to the GPU hardware and the task specific data layout. Our results were carried out on 160 different input problems with the proposed approach an average speedup against the state of the art implementation from NVIDIA (cuBLAS) from around 12 on a NVIDIA GTX 1080 Ti accelerator card can be achieved.
Author(s)
Sailer, Johannes
Frey, Christian  
Kühnert, Christian  
Mainwork
Machine Learning for Cyber Physical Systems. Selected papers from the International Conference ML4CPS 2018  
Conference
Conference on Machine Learning for Cyber-Physical-Systems and Industry 4.0 (ML4CPS) 2018  
Open Access
File(s)
Download (324.99 KB)
Rights
CC BY 4.0: Creative Commons Attribution
DOI
10.24406/publica-r-403674
10.1007/978-3-662-58485-9_8
Language
English
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Keyword(s)
  • GPU

  • Matrix Multiplication

  • Autotuning

  • automatic gerneration

  • acceleration

  • CUDA

  • BLAS

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024