Engineering a multi-core radix sort

Wassenberg, J.; Sanders, P.

2011

Conference Paper

Abstract

We present a fast radix sorting algorithm that builds upon a microarchitecture-aware variant of counting sort. Taking advantage of virtual memory and making use of write-combining yields a per-pass throughput corresponding to at least 89% of the system's peak memory bandwidth. Our implementation outperforms Intel's recently published radix sort by a factor of 1.64. It also compares favorably to the reported performance of an algorithm for Fermi GPUs when data-transfer overhead is included. These results indicate that scalar, bandwidth-sensitive sorting algorithms remain competitive on current architectures. Various other memory-intensive applications can benefit from the techniques described herein.

Author(s)

Wassenberg, J.

Sanders, P.

Hauptwerk

Euro-Par 2011 parallel processing. 17th international Conference, Euro-Par 2011. Vol.2

Konferenz

International Conference on Parallel Processing (Euro-Par) 2011

Options

Engineering a multi-core radix sort