• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Crossing Domains without Labels: Distant Supervision for Term Extraction
 
  • Details
  • Full
Options
2025
Conference Paper
Title

Crossing Domains without Labels: Distant Supervision for Term Extraction

Abstract
Automatic Term Extraction (ATE) is a critical component in downstream NLP tasks such as document tagging, ontology construction and patent analysis. Current state-of-the-art methods require expensive human annotation and struggle with domain transfer, limiting their practical deployment. This highlights the need for more robust, scalable solutions and realistic evaluation settings. To address this, we introduce a comprehensive benchmark spanning seven diverse domains, enabling performance evaluation at both the document- and corpus-levels. Furthermore, we propose a robust LLM-based model that outperforms both supervised cross-domain encoder models and few-shot learning baselines and performs competitively with its GPT-4o teacher on this benchmark.The first step of our approach is generating psuedo-labels with this black-box LLM on general and scientific domains to ensure generalizability. Building on this data, we fine-tune the first LLMs for ATE. To further enhance document-level consistency, oftentimes needed for downstream tasks, we introduce lightweight post-hoc heuristics. Our approach exceeds previous approaches on 5/7 domains with an average improvement of 10 percentage points. We release our dataset and fine-tuned models to support future research in this area.
Author(s)
Senger, Elena
Fraunhofer-Zentrum für Internationales Management und Wissensökonomie IMW  
Campbell Borges, Yuri Cassio
Fraunhofer-Zentrum für Internationales Management und Wissensökonomie IMW  
Groot, Rob van der
IT University of Copenhagen  
Plank, Barbara
MaiNLP, Center for Information and Language Processing, Munich
Mainwork
The 2025 Conference on Empirical Methods in Natural Language Processing - proceedings of the industry track  
Conference
Conference on Empirical Methods in Natural Language Processing 2025  
Open Access
DOI
10.18653/v1/2025.emnlp-industry.95
Additional link
Full text
Language
English
Fraunhofer-Zentrum für Internationales Management und Wissensökonomie IMW  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024