CADA: phenotype-driven gene prioritization based on a case-enriched knowledge graph

Peng, Chengyao; Dieck, Simon; Schmid, Alexander; Ahmad, Ashar; Knaus, Alexej; Wenzel, Maren; Mehnert, Laura; Zirn, Birgit; Haack, Tobias; Ossowski, Stephan; Wagner, Matias; Brunet, Theresa; Ehmke, Nadja; Danyel, Magdalena; Rosnev, Stanislav; Kamphans, Tom; Nadav, Guy; Fleischer, Nicole; Fröhlich, Holger; Krawitz, Peter

doi:10.1093/nargab/lqab078

September 3, 2021

Journal Article

Abstract

Many rare syndromes can be well described and delineated from other disorders by a combination of characteristic symptoms. These phenotypic features are best documented with terms of the Human Phenotype Ontology (HPO), which are increasingly used in electronic health records (EHRs), too. Many algorithms that perform HPO-based gene prioritization have also been developed; however, the performance of many such tools suffers from an over-representation of atypical cases in the medical literature. This is certainly the case if the algorithm cannot handle features that occur with reduced frequency in a disorder. With Cada, we built a knowledge graph based on both case annotations and disorder annotations. Using network representation learning, we achieve gene prioritization by link prediction. Our results suggest that Cada exhibits superior performance particularly for patients that present with the pathognomonic findings of a disease. Additionally, information about the frequency of occurrence of a feature can readily be incorporated, when available. Crucial in the design of our approach is the use of the growing amount of phenotype–genotype information that diagnostic labs deposit in databases such as ClinVar. By this means, Cada is an ideal reference tool for differential diagnostics in rare disorders that can also be updated regularly.

Author(s)

Peng, Chengyao

Dieck, Simon

Schmid, Alexander

Ahmad, Ashar

Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI

Knaus, Alexej

Wenzel, Maren

Mehnert, Laura

Zirn, Birgit

Haack, Tobias

Ossowski, Stephan

Wagner, Matias

Brunet, Theresa

Ehmke, Nadja

Danyel, Magdalena

Rosnev, Stanislav

Kamphans, Tom

Nadav, Guy

Fleischer, Nicole

Fröhlich, Holger

Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI

Krawitz, Peter

Journal

NAR Genomics and bioinformatics

Options

CADA: phenotype-driven gene prioritization based on a case-enriched knowledge graph