Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Identification of disease-distinct complex biomarker patterns by means of unsupervised machine-learning using an interactive R toolbox (Umatrix)

 
: Lötsch, J.; Lerch, F.; Djaldetti, R.; Tegder, I.; Ultsch, A.

:
Fulltext ()

Big data analytics 3 (2018), Art. 5, 17 pp.
ISSN: 2058-6345
English
Journal Article, Electronic Publication
Fraunhofer IME ()

Abstract
Background. Unsupervised machine-learned analysis of cluster structures, applied using the emergent self-organizing feature maps (ESOM) combined with the unified distance matrix (U-matrix) has been shown to provide an unbiased method to identify true clusters. It outperforms classical hierarchical clustering algorithms that carry a considerable tendency to produce erroneous results. To facilitate the application of the ESOM/U-matrix method in biomedical research, we introduce the interactive R-based bioinformatics tool “Umatrix”, which enables valid identification of a biologically meaningful cluster structure in the data by training a Kohonen-type self-organizing map followed by interface-guided interactive clustering on the emergent U-matrix map. Results. The ability to detect clinical relevant subgroups was applied to a data set comprising plasma concentrations of d = 25 lipid markers including endocannabinoids, lysophosphatidic acids, ceramides and sphingolipids acquired from n = 100 patients with Parkinson's disease and n = 100 controls. Following ESOM training, clear data structures in the high-dimensional data space were observed on the U-matrix, allowing separation of patients from controls almost perfectly. When the data structure was destroyed by Monte-Carlo random resampling, the U-matrix became unstructured and patients and controls were mixed. Obtained results are biologically plausible and supported by empirical evidence of a regulation of several classes of lipids in Parkinson's disease. Conclusions. Sophisticated analysis of structures in biomedical data provides a basis for the mechanistic interpretation of the observations and facilitates subsequent analyses focusing on hypothesis testing. The freely available R library “Umatrix” provides an interactive tool for broader application of unsupervised machine learning on complex biomedical data.

: http://publica.fraunhofer.de/documents/N-569074.html