Options
September 20, 2024
Master Thesis
Title
PDataViewer: Investigation of Parkinson's Disease Landscape and Enabling Semantic Data Harmonization Through Language Models
Abstract
Parkinson’s disease poses significant research challenges due to its biological complexity. While cohort studies have advanced our understanding, they often suffer from biases related to selection criteria and inconsistent naming systems across studies. Effective exploration of Parkinson’s disease, via cross-cohort analyses, requires rigorous data harmonization to enhance the generalizability of findings. Previous initiatives, such as the Observational Medical Outcomes Partnership and the Common Data Element Catalog, have made strides but did not fully address the specific needs of Parkinson’s disease research.
To tackle these issues, we developed PASSIONATE, a common data model tailored for the Parkinson’s disease research community. PASSIONATE emphasizes the unique aspects of Parkinson’s disease data and cross-referenced with existing ontologies where applicable to further standardize the naming system. Additionally, we created PDataViewer, a web application that helps researchers locate suitable cohort studies based on variable queries. This tool enhances the visualization of biomarker distributions and provides insight into participant drop-off rates.
PDataViewer includes an auto-harmonization tool that streamlines the semantic harmonization of cohort data, allowing users to align variables with PASSIONATE-defined terms as well as those from other cohorts. It employs a vector database for semantic harmonization, ensuring scalability and efficiency. By facilitating the identification of comparable cohorts, PDataViewer is crucial for conducting cross-cohort analyses, improving transparency in patient-level data. The separation of backend and frontend components enhances participant data security, addressing privacy concerns that are common in cohort studies.
Overall, the introduction of PASSIONATE and PDataViewer improves the findability, interoperability, and transparency of data in Parkinson’s disease research, potentially paving the way for federated learning approaches in the field.
To tackle these issues, we developed PASSIONATE, a common data model tailored for the Parkinson’s disease research community. PASSIONATE emphasizes the unique aspects of Parkinson’s disease data and cross-referenced with existing ontologies where applicable to further standardize the naming system. Additionally, we created PDataViewer, a web application that helps researchers locate suitable cohort studies based on variable queries. This tool enhances the visualization of biomarker distributions and provides insight into participant drop-off rates.
PDataViewer includes an auto-harmonization tool that streamlines the semantic harmonization of cohort data, allowing users to align variables with PASSIONATE-defined terms as well as those from other cohorts. It employs a vector database for semantic harmonization, ensuring scalability and efficiency. By facilitating the identification of comparable cohorts, PDataViewer is crucial for conducting cross-cohort analyses, improving transparency in patient-level data. The separation of backend and frontend components enhances participant data security, addressing privacy concerns that are common in cohort studies.
Overall, the introduction of PASSIONATE and PDataViewer improves the findability, interoperability, and transparency of data in Parkinson’s disease research, potentially paving the way for federated learning approaches in the field.
Thesis Note
Bonn, Univ., Master Thesis, 2024
Advisor(s)
Open Access
Rights
CC BY 4.0: Creative Commons Attribution
Language
English