• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Historic to FAIR: Leveraging LLMs for Historic Term Identification and Standardization
 
  • Details
  • Full
Options
2025
Conference Paper
Title

Historic to FAIR: Leveraging LLMs for Historic Term Identification and Standardization

Abstract
As the availability of historical biodiversity data continues to grow, ensuring its usability through adherence to FAIR principles (Findable, Accessible, Interoperable, and Reusable) has become increasingly essential. This study addresses a key challenge in biodiversity data drawn from historical texts: identifying and interpreting common species names and scientific names. We highlight five main issues associated with historical common names: variations in spelling, the creation of new terms, shifts from broad historical names to more specific modern ones (and vice versa), and the renaming of historical terms. To tackle these challenges, we explore the application of a large language model (GPT-4) for entity detection and terminology alignment. Our findings demonstrate that GPT-4, when provided with a small context, can effectively identify both historical common species names and modern scientific names. On a test dataset, the model achieved a 92% success rate in detecting historical common names and correctly identified 98% of scientific terms. Additionally, for four of the five identified challenges, the LLM provided meaningful insights, including successfully matching historical common names to their modern counterparts. We demonstrate an embedded understanding of the evolution of biodiversity terminology within the model which underscores its potential to mobilize historical biodiversity data according to FAIR
Author(s)
Fillies, Jan
Teich, Maximilian
Karam, Naouel
Paschke, Adrian  
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Rehbein, Malte
Mainwork
Datenbanksysteme für Business, Technologie und Web, BTW 2025. Workshopband  
Conference
Fachtagung "Datenbanksysteme für Business, Technologie und Web" 2025  
Open Access
File(s)
Download (90.87 KB)
Rights
CC BY-SA 4.0: Creative Commons Attribution-ShareAlike
DOI
10.18420/BTW2025-121
10.24406/publica-7165
Language
English
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Keyword(s)
  • Large Language Models

  • FAIR Principals,

  • Language Standardization

  • Data Interoperability

  • Historic Data

  • Semantic Annotation

  • Taxonomies

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024