Evaluating fault-tolerant data organization for data-intensive cluster applications

Wegener, DennisDennisWegenerMock, MichaelMichaelMock2022-03-102022-03-102008https://publica.fraunhofer.de/handle/publica/359118Data analysis has to cope with the continuous growth of data and computational requirements. Making use of distributed computing enables users to profit from parallel processing. For dealing with data intensive applications, data can be organized in a distributed manner. However, distributed data organization includes a higher risk of data loss. In this paper, we present the investigation of the effects of storage faults on data intensive parallel cluster applications. In particular, we evaluate different scenarios of organizing the processing and data based on technology like the Condor and Hadoop cluster management systems.en005Evaluating fault-tolerant data organization for data-intensive cluster applicationspresentation