Evaluating fault-tolerant data organization for data-intensive cluster applications
Presentation held at the International Workshop on Dependable Network Computing and Mobile Systems, DNCMS 08, Naples, Italy, October 5th, 2008
Data analysis has to cope with the continuous growth of data and computational requirements. Making use of distributed computing enables users to profit from parallel processing. For dealing with data intensive applications, data can be organized in a distributed manner. However, distributed data organization includes a higher risk of data loss. In this paper, we present the investigation of the effects of storage faults on data intensive parallel cluster applications. In particular, we evaluate different scenarios of organizing the processing and data based on technology like the Condor and Hadoop cluster management systems.