Extending PluTo for Multiple Devices by Integrating OpenACC
For many years now, processor vendors increased the performance of their devices by adding more cores and wider vectorization units to their CPUs instead of scaling up the processors' clock frequency. Moreover, GPUs became popular for solving problems with even more parallel compute power. To exploit the full potential of modern compute devices, specific codes are necessary which are often coded in a hardware-specific manner. Usually, the codes for CPUs are not usable for GPUs and vice versa. The programming API OpenACC tries to close this gap by enabling one code-base to be suitable and optimized for many devices. Nevertheless, OpenACC is rarely used by `standard programmers' and while different code transformers (like PluTo) allow for (semi-)automatic code parallelization for multi-core CPUs, they do generally not support OpenACC yet. We present first promising results of our PluTo extension that generates parallelized codes using OpenACC. Using our transformer we create programs which exploit the parallelism of different platforms without any manual modifications and we achieve performance speedups of up to 100 in comparison to the original unoptimized programs and accelations of 2.05 in comparison to equally generated OpenMP codes.