Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

GCMR: A GPU cluster-based MapReduce framework for large-scale data processing

: Guo, Y.; Liu, W.; Gon, B.; Voss, G.; Mueller-Wittig, W.


Institute of Electrical and Electronics Engineers -IEEE-; IEEE Computer Society; International Federation for Information Processing -IFIP-:
15th IEEE International Conference on High Performance Computing and Communications, HPCC 2013 and 11th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, EUC 2013. Proceedings. Vol.1 : Zhangjiajie, Hunan Province, P.R. China, 13-15 November 2013
Los Alamitos, Calif.: IEEE Computer Society Conference Publishing Services (CPS), 2013
ISBN: 978-0-7695-5088-6
ISBN: 978-1-4799-0973-5
International Conference on High Performance Computing and Communications (HPCC) <15, 2013, Zhangjiajie>
International Conference on Embedded and Ubiquitous Computing (EUC) <11, 2013, Zhangjiajie>
Fraunhofer IDM@NTU ()

MapReduce is a very popular programming model to support parallel and distributed large-scale data processing. There have been a lot of efforts to implement this model on commodity GPU-based systems. However, most of these implementations can only work on a single GPU. And they can not be used to process large-scale datasets. In this paper, we present a new approach to design the MapReduce framework on GPU clusters for handling large-scale data processing. We have used Compute Unified Device Architectures (CUDA) and MPI parallel programming models to implement this framework. To derive an efficient mapping onto GPU clusters, we introduce a two-level parallelization approach: the inter node level and intra node level parallelization. Furthermore in order to improve the overall MapReduce efficiency, a multi-threading scheme is used to overlap the communication and computation on a multi-GPU node. Compared to previous GPU-based MapReduce implementations, our implementation, called GCMR, achieves speedups up to 2.6 on a single node and up to 9.1 on 4 nodes of a Tesla S1060 quad-GPU cluster system for processing small datasets. It also shows very good scalability for processing large-scale datasets on the cluster system.