Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Petaflop Seismic Simulations in the Public Cloud

: Breuer, Alexander; Cui, Yifeng; Heinecke, Alexander


Weiland, Michèle:
High performance computing. 34th International Conference, ISC High Performance 2019. Proceedings : Frankfurt/Main, Germany, June 16-20, 2019
Cham: Springer Nature, 2019 (Lecture Notes in Computer Science 11501)
ISBN: 978-3-030-20655-0 (Print)
ISBN: 978-3-030-20656-7
ISBN: 3-030-20655-6
International Conference on High Performance Computing <34, 2019, Frankfurt/Main>
Fraunhofer ITWM ()

During the last decade cloud services and infrastructure as a service became a popular solution for diverse applications. Additionally, hardware support for virtualization closed performance gaps, compared to on-premises, bare-metal systems. This development is driven by offloaded hypervisors and full CPU virtualization. Today’s cloud service providers, such as Amazon or Google, offer the ability to assemble application-tailored clusters to maximize performance. However, from an interconnect point of view, one has to tackle a 4–5× slow-down in terms of bandwidth and 25× in terms of latency, compared to latest high-speed and low-latency interconnects. Taking into account the high per-node and accelerator-driven performance of latest supercomputers, we observe that the network-bandwidth performance of recent cloud offerings is within 2× of large supercomputers. In order to address these challenges, we present a comprehensive application-centric approach for high-order seismic simulations utilizing the ADER discontinuous Galerkin finite element method, which exhibits excellent communication characteristics. This covers the tuning of the operating system, normally not possible on supercomputers, micro-benchmarking, and finally, the efficient execution of our solver in the public cloud. Due to this performance-oriented end-to-end workflow, we were able to achieve 1.09 PFLOPS on 768 AWS c5.18xlarge instances, offering 27,648 cores with 5 PFLOPS of theoretical computational power. This correlates to an achieved peak efficiency of over 20% and a close-to 90% parallel efficiency in a weak scaling setup. In terms of strong scalability, we were able to strong-scale a science scenario from 2 to 64 instances with 60% parallel efficiency. This work is, to the best of our knowledge, the first of its kind at such a large scale.