Now showing 1 - 8 of 8
  • Publication
    Co-regularised support vector regression
    We consider a semi-supervised learning scenario for regression, where only few labelled examples, many unlabelled instances and different data representations (multiple views) are available. For this setting, we extend support vector regression with a co-regularisation term and obtain co-regularised support vector regression (CoSVR). In addition to labelled data, co-regularisation includes information from unlabelled examples by ensuring that models trained on different views make similar predictions. Ligand affinity prediction is an important real-world problem that fits into this scenario. The characterisation of the strength of protein-ligand bonds is a crucial step in the process of drug discovery and design. We introduce variants of the base CoSVR algorithm and discuss their theoretical and computational properties. For the CoSVR function class we provide a theoretical bound on the Rademacher complexity. Finally, we demonstrate the usefulness of CoSVR for the affinity prediction task and evaluate its performance empirically on different protein-ligand datasets. We show that CoSVR outperforms co-regularised least squares regression as well as existing state-of-the-art approaches for affinity prediction. Code and data related to this chapter are available at:
  • Publication
    Ligand-based virtual screening with co-regularised support vector regression
    We consider the problem of ligand affinity prediction as a regression task, typically with few labelled examples, many unlabelled instances, and multiple views on the data. In chemoinformatics, the prediction of binding affinities for protein ligands is an important but also challenging task. As protein-ligand bonds trigger biochemical reactions, their characterisation is a crucial step in the process of drug discovery and design. However, the practical determination of ligand affinities is very expensive, whereas unlabelled compounds are available in abundance. Additionally, many different vectorial representations for compounds (molecular fingerprints) exist that cover different sets of features. To this task we propose to apply a co-regularisation approach, which extracts information from unlabelled examples by ensuring that individual models trained on different fingerprints make similar predictions. We extend support vector regression similarly to the existing co-regularised least squares regression (CoRLSR) and obtain a co-regularised support vector regression (CoSVR). We empirically evaluate the performance of CoSVR on various protein-ligand datasets. We show that CoSVR outperforms CoRLSR as well as existing state-of-the- art approaches that do not take unlabelled molecules into account. Additionally, we provide a theoretical bound on the Rademacher complexity for CoSVR.
  • Publication
    Context-based clustering of image search results
    In this work we propose to cluster image search results based on the textual contents of the referring webpages. The natural ambiguity and context-dependence of human languages lead to problems that plague modern image search engines: A user formulating a query usually has in mind just one topic, while the results produced to satisfy this query may (and usually do) belong to the different topics. Therefore, only part of the search results are relevant for a user. One of the possible ways to improve the user's experience is to cluster the results according to the topics they belong to and present the clustered results to the user. As opposed to the clustering based on visual features, an approach utilising the text information in the webpages containing the image is less computationally intensive and provides the resulting clusters with semantically meaningful names.
  • Patent
    Vorrichtung und Verfahren zum Bestimmen einer pharmazeutischen Aktivitaet eines Molekuels
    (A1) Eine Vorrichtung zum Bestimmen einer pharmazeutischen Aktivitaet eines Molekuels (M) weist eine Einrichtung (110) zum Bestimmen von in dem Molekuel auftretenden Atomstrukturen, eine Einrichtung (120) zum Zuweisen eines Merkmalsindex (MI), eine Einrichtung (130) zum Ermitteln eines Merkmalsvektors (MV) und eine Einrichtung (140) zum Bestimmen einer Zugehoerigkeit auf. Die Einrichtung (120) weist den Merkmalsindex (MI) zu einer der auftretenden Atomstrukturen in dem Molekuel (M) abhaengig von der jeweiligen Atomstruktur und einer Nachbarschaft der jeweiligen Atomstruktur in dem Molekuel (M) zu. Die Einrichtung (130) ermittelt den Merkmalsvektor (MV) fuer das Molekuel (M) abhaengig von dem zugewiesenen Merkmalsindex (MI), wobei der Merkmalsvektor (MV) auf einen Punkt in einem Merkmalsraum (MR) zeigt, und wobei der Merkmalsraum (MR) eine erste Domain (A), die pharmazeutisch aktiven Molekuelen entspricht, und eine zweite Domain (B), die pharmazeutisch inaktiven Molekuelen entspricht, aufweist. Die Einrichtung (140) bestimmt die Zugehoerigkeit des Punktes zu der ersten Domain (A) oder der zweiten Domain (B).
  • Publication
    Support-vector-machine-based ranking significantly improves the effectiveness of similarity searching using 2D fingerprints and multiple reference compounds
    ( 2008)
    Geppert, H.
    ; ; ;
    Bajorath, J.
    Similarity searching using molecular fingerprints is computationally efficient and a surprisingly effective virtual screening tool. In this study, we have compared ranking methods for similarity searching using multiple active reference molecules. Different 2D fingerprints were used as search tools and also as descriptors for a support vector machine (SVM) algorithm. In systematic database search calculations, a SVM-based ranking scheme consistently outperformed nearest neighbor and centroid approaches, regardless of the fingerprints that were tested, even if only very small training sets were used for SVM learning. The superiority of SVM-based ranking over conventional fingerprint methods is ascribed to the fact that SVM makes use of information about database molecules, in addition to known active compounds, during the learning phase.