Extending workflow management for knowledge discovery in clinico-genomic data.
Recent advances in research methods and technologies have resulted in an explosion of information and knowledge about cancers and their treatment. Knowledge Discovery (KD) is a key technique for dealing with this massive amount of data and the challenges of managing the steadily growing amount of available knowledge. In this paper, we present the ACGT integrated project, which is to contribute to the resolution of these problems by developing semantic grid services in support of multi-centric, post-genomic clinical trials. In particular, we describe the challenges of KD in clinico-genomic data in a collaborative Grid framework, and present our approach to overcome these difficulties by improving workflow management, construction and managing workflow results and provenance information. Our approach combines several techniques into a framework that is suitable to address the problems of interactivity and multiple dependencies between workflows, services, and data.