Options
2019
Journal Article
Title
Prognostically Relevant Subtypes and Survival Prediction for Breast Cancer Based on Multimodal Genomics Data
Abstract
Cancer is one of the deadliest diseases caused by abnormal behaviors of genes that control the cell division and growth. Genomics data and clinical outcomes from multiplatform and heterogeneous sources are used to make clinical decisions for the cancer patients, where both multimodality and heterogeneity impose significant challenges to bioinformatics tools and algorithms. Numerous works have been proposed to overcome these challenges by using sophisticated bioinformatics and machine learning algorithms as either primary or supporting tools. In this paper, we propose a new approach to analyze genomics data from The Cancer Genome Atlas (TCGA) to classify breast cancer patients based on their subtypes and survival rates. Since multiple factors such as estrogen receptor (ER), progesterone receptor (PGR), and human epidermal growth factor receptor 2 (HER2) statuses are involved in breast cancer diagnosis, we used DNA methylation, gene expression (GE), and miRNA expression data by creating a multiplatform network called Multimodal Autoencoders (MAE) classifier to support each data type. Experiment results demonstrate that our approach is promising with high confidence for predicting both breast cancer subtypes and survival rates. In particular, we achieved state-of-the-art results with accuracies of 91% and 86%, respectively for the ER and PGR-based subtype prediction and moderately low accuracy for the HER2-based subtype prediction as well as we perceived reasonably low MSE and positive coefficient of determination (R 2 ) scores in case of survival prediction. Additionally, we created unimodal and multimodal features from each input type and trained decision tree (DT), Naive Bayes (NB), K-nearest neighbors (KNN), logistic regression (LR), support vector machine (SVM), random forest (RF), and gradient boosting trees (GBT) as ML baseline models. Finally, we use the model averaging ensemble of top-3 models to report the final prediction.