• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Artikel
  4. From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports
 
  • Details
  • Full
Options
September 2025
Journal Article
Title

From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports

Abstract
Background: Extracting clinical entities from unstructured medical documents is critical for improving clinical decision support and documentation workflows. This study examines the performance of various encoder and decoder models trained for Named Entity Recognition (NER) of clinical parameters in pathology and radiology reports, highlighting the applicability of Large Language Models (LLMs) for this task.
Methods: Three NER methods were evaluated: (1) flat NER using transformer-based models, (2) nested NER with a multi-task learning setup, and (3) instruction-based NER utilizing LLMs. A dataset of 2013 pathology reports and 413 radiology reports, annotated by medical students, was used for training and testing.
Results: The performance of encoder-based NER models (flat and nested) was superior to that of LLM-based approaches. The best-performing flat NER models achieved F1-scores of 0.87–0.88 on pathology reports and up to 0.78 on radiology reports, while nested NER models performed slightly lower. In contrast, multiple LLMs, despite achieving high precision, yielded significantly lower F1-scores (ranging from 0.18 to 0.30) due to poor recall. A contributing factor appears to be that these LLMs produce fewer but more accurate entities, suggesting they become overly conservative when generating outputs.
Conclusion: LLMs in their current form are unsuitable for comprehensive entity extraction tasks in clinical domains, particularly when faced with a high number of entity types per document, though instructing them to return more entities in subsequent refinements may improve recall. Additionally, their computational overhead does not provide proportional performance gains. Encoder-based NER models, particularly those pre-trained on biomedical data, remain the preferred choice for extracting information from unstructured medical documents.
Author(s)
Arzideh, Kamyar
Schäfer, Henning
Allende-Cid, Héctor  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Baldini, Giulia
Hilser, Thomas
Idrissi-Yaghir, Ahmad
Laue, Katharina
Chakraborty, Nilesh  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Doll, Niclas
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Antweiler, Dario  orcid-logo
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Klug, Katrin  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Beck, Niklas  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Giesselbach, Sven  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Friedrich, Christoph M.
Nensa, Felix
Schuler, Martin
Hosch, René
Journal
Computers in biology and medicine  
Project(s)
SmartHospital.NRW
Funder
Ministry for Economic Affairs, Industry, Climate Action and Energy of the State of North Rhine-Westphalia
Open Access
File(s)
Download (2.82 MB)
Rights
CC BY 4.0: Creative Commons Attribution
DOI
10.1016/j.compbiomed.2025.110665
10.24406/publica-4861
Additional full text version
Landing Page
Language
English
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Fraunhofer Group
Fraunhofer-Verbund Gesundheit  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024