Publica
Hier finden Sie wissenschaftliche Publikationen aus den FraunhoferInstituten. Learning interpretable SVMs for biological sequence classification
 Miyano, S.; Mesirov, J.; Kasif, S.; Istrail, S.; Pevzner, P.; Waterman, M.: Research in computational molecular biology : 9th annual international conference, RECOMB 2005, Cambridge, MA, USA, May 14  18, 2005 Berlin: Springer, 2005 (Lecture notes in bioinformatics 3500) ISBN: 3540258663 pp.389407 
 Annual International Conference on Research in Computational Molecular Biology (RECOMB) <9, 2005, Cambridge/Mass.> 

 English 
 Conference Paper 
 Fraunhofer FIRST () 
Abstract
We propose novel algorithms for solving the socalled Support Vector Multiple Kernel Learning problem and show how they can be used to understand the resulting support vector decision function. While classical kernelbased algorithms (such as SVMs) are based on a single kernel, in Multiple Kernel Learning a quadraticallyconstraint quadratic program is solved in order to find a sparse convex combination of a set of support vector kernels. We show how this problem can be cast into a semiinfinite linear optimization problem which can in turn be solved efficiently using a boostinglike iterative method in combination with standard SVM optimization algorithms. The proposed method is able to deal with thousands of examples while combining hundreds of kernels within reasonable time. In the second part we show how this technique can be used to understand the obtained decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We consider the problem of splice site identification and combine string kernels at different sequence positions and with various substring (oligomer) lengths. The proposed algorithm computes a sparse weighting over the length and the substring, highlighting which substrings are important for discrimination. Finally, we propose a bootstrap scheme in order to reliably identify a few statistically significant positions, which can then be used for further analysis such as consensus finding.