Options
October 7, 2025
Master Thesis
Title
Bringing Biomedical Artificial Intelligence into Practice: Graph-Based Hypothesis Validation Using BioAssays and Large Language Models
Abstract
In recent years, advances in artificial intelligence, natural language processing, and knowledge graphs have produced a flood of biomedical hypotheses. However, while hypothesis generation has advanced rapidly, the step of experimental validation remains a bottleneck. In biomedicine, validation often requires carefully choosing the right assay, yet this process is still largely manual, error-prone, and limited by the expertise of individual researchers. This thesis addresses this gap by developing aworkflowto systematically retrieve and rank bioassays from PubChem that can support the validation of structured biomedical hypotheses. The workflow combines multiple steps: embedding bioassay and hypothesis descriptions, extracting biomedical entities with the Unified Medical Language System, constructing integrated knowledge graphs, and computing cosine similarity to match hypotheses with assays. The top-ranked assays are then passed to a large language model for justification and categorization into levels of support. The results show that the workflow can successfully identify assays relevant to hypotheses, not only finding direct experimental matches but also highlighting indirect or adaptable assays, which reflects how validation often works in real laboratory practice. In comparative testing, the Qwen embedding model generally produced more realistic similarity scores than the domain-specific Simonlee model. In addition, GPT outperformed the other large language models in providing accurate and interpretable justifications, especially when guided by the curated candidate list generated by the pipeline. Overall, the thesis provides a proof of concept for bridging the gap between in-silico hypothesis generation and real-world validation. With further improvements in assay curation, BioAssay Ontology annotations, and integration of cellular context, this approach could be developed into an interactive system where researchers directly input hypotheses and receive ranked, justified assay suggestions to guide experimental planning.
Thesis Note
Bonn-Aachen, Univ., Master Thesis, 2025
Author(s)
Advisor(s)
Open Access
File(s)
Rights
CC BY 4.0: Creative Commons Attribution
Language
English