Specification of distributed data mining workflows with DataMiningGrid
This chapter gives an evaluation of the benefits of grid-based technology from a data miner's perspective. It is focused on the DataMiningGrid, a standardbased and extensible environment for grid-enabling data mining applications. Three generic and very common data mining tasks were analyzed: enhancing scalability by data partitioning; comparing classifier performance; parameter optimization. Grid-based data mining and the DataMiningGrid in particular emerge as a general tool for enhancing the scalability of a large number of data mining applications. The basis for this broad applicability is the DataMiningGrid's extensibility mechanism. To support the scenarios described above, we have extended the original DataMiningGrid system by a set of new components.