Options
2010
Conference Paper
Titel
Combining statistical independence testing, visual attribute selection and automated analysis to find relevant attributes for classification
Abstract
We present an iterative strategy for finding a relevant subset of attributes for the purpose of classification in high-dimensional, heterogeneous data sets. The attribute subset is used for the construction of a classifier function. In order to cope with the challenge of scalability, the analysis is split into an overview of all attributes and a detailed analysis of small groups of attributes. The overview provides generic information on statistical dependencies between attributes. With this information the user can select groups of attributes and an analytical method for their detailed analysis. The detailed analysis involves the identification of redundant attributes (via classification or regression) and the creation of summarizing attributes (via clustering or dimension reduction). Our strategy does not prescribe specific analytical methods. Instead, we recursively combine the results of different methods to find or generate a subset of attributes to use for classification.