• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Enhancing Digital Libraries with Automated Definition Generation
 
  • Details
  • Full
Options
2024
Conference Paper
Title

Enhancing Digital Libraries with Automated Definition Generation

Abstract
Scientific domains encompass many concepts that require a concise term definition to enable a common understanding among researchers, in particular for interdisciplinary fields. In digital libraries, information access and sharing is often facilitated by terminology databases. However, building up such resources is expensive to produce manually and requires expert knowledge. Automatically generating definitions for scientific terms has become a hot research topic recently that can reduce the manual burden. However, current methods heavily rely on large language models (LLMs) that store factual knowledge in their parameters, so that knowledge cannot be easily updated for emerging scientific terms. Furthermore, a major shortcoming of these models is that they are prone to hallucination and their output is difficult to control. To bridge these gaps, we propose to address the task of definition generation through guided abstractive summarization, incorporating key information from external resources. At test time, we augment the model with retrieved abstracts from Scopus and use automatically extracted topics and keywords as guidance, both essential for definition generation. To this aim, our approach takes into account two relevant sub-tasks in the process, a) predicting the topic class and b) generating hypernym candidates for the term. Our proposed pipelined approach for automatic guided definition generation achieves significant performance improvement over the standard baselines as well as relevant prior works on this problem. We use BLEU, ROUGE and BERTScore to automatically evaluate the quality of the systems on our benchmark and carry out a human evaluation to assess fluency, relevancy, coherence and factuality of the output. Our experiments show that LLMs can provide fluent and coherent definitions, and are often on par with human created definitions. Yet, there is still room for improvement on identifying relevant content and improving factual correctness.
Author(s)
Zielinski, Andrea  orcid-logo
Fraunhofer-Institut für System- und Innovationsforschung ISI  
Hirzel, Simon  orcid-logo
Fraunhofer-Institut für System- und Innovationsforschung ISI  
Arnold-Keifer, Sonja  orcid-logo
Fraunhofer-Institut für System- und Innovationsforschung ISI  
Mainwork
JCDL 2024, 24th ACM/IEEE Joint Conference on Digital Libraries. Proceedings  
Conference
Joint Conference on Digital Libraries 2024  
Open Access
File(s)
Download (600.88 KB)
Rights
CC BY 4.0: Creative Commons Attribution
DOI
10.1145/3677389.3702536
10.24406/publica-4457
Additional full text version
Landing Page
Language
English
Fraunhofer-Institut für System- und Innovationsforschung ISI  
Keyword(s)
  • Natural language generation

  • Terminology

  • Definitions

  • Large language models

  • Indexing

  • Information retrieval

  • Information systems

  • Digital libraries

  • Natural language processing

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024