Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Design of a High-Performance Tensor-Vector Multiplication with BLAS

: Bassoy, Cem


Rodrigues, J,:
Computational science - ICCS 2019. Part 1 : 19th international conference, Faro, Portugal, June 12-14, 2019 : proceedings
Cham: Springer International Publishing, 2019 (Lecture Notes in Computer Science 11536)
ISBN: 978-3-030-22733-3
ISBN: 978-3-030-22734-0
International Conference on Computational Science (ICCS) <19, 2019, Faro>
Fraunhofer IOSB ()

Tensor contraction is an important mathematical operation for many scientific computing applications that use tensors to store massive multidimensional data. Based on the Loops-over-GEMMs (LOG) approach, this paper discusses the design of high-performance algorithms for the mode-q tensor-vector multiplication using efficient implementations of the matrix-vector multiplication (GEMV). Given dense tensors with any non-hierarchical storage format, tensor order and dimensions, the proposed algorithms either directly call GEMV with tensors or recursively apply GEMV on higher-order tensor slices multiple times. We analyze strategies for loop-fusion and parallel execution of slice-vector multiplications with higher-order tensor slices. Using OpenBLAS, our parallel implementation attains 34.8 Gflops/s in single precision on a Core i9-7900X Intel Xeon processor. Our parallel version of the tensor-vector multiplication is on average 6.1x and up to 12.6x faster than state-of-the-art approaches.