Design of a High-Performance Tensor-Vector Multiplication with BLAS

Bassoy, Cem

doi:10.1007/978-3-030-22734-0_3

2019

Conference Paper

Abstract

Tensor contraction is an important mathematical operation for many scientific computing applications that use tensors to store massive multidimensional data. Based on the Loops-over-GEMMs (LOG) approach, this paper discusses the design of high-performance algorithms for the mode-q tensor-vector multiplication using efficient implementations of the matrix-vector multiplication (GEMV). Given dense tensors with any non-hierarchical storage format, tensor order and dimensions, the proposed algorithms either directly call GEMV with tensors or recursively apply GEMV on higher-order tensor slices multiple times. We analyze strategies for loop-fusion and parallel execution of slice-vector multiplications with higher-order tensor slices. Using OpenBLAS, our parallel implementation attains 34.8 Gflops/s in single precision on a Core i9-7900X Intel Xeon processor. Our parallel version of the tensor-vector multiplication is on average 6.1x and up to 12.6x faster than state-of-the-art approaches.

Author(s)

Bassoy, Cem

Mainwork

Computational science - ICCS 2019. Part 1

Conference

International Conference on Computational Science (ICCS) 2019

Options

Design of a High-Performance Tensor-Vector Multiplication with BLAS