• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. NERdME: A Named Entity Recognition Dataset for Indexing Research Artifacts in Code Repositories
 
  • Details
  • Full
Options
April 12, 2026
Conference Paper
Title

NERdME: A Named Entity Recognition Dataset for Indexing Research Artifacts in Code Repositories

Abstract
Existing scholarly information extraction (SIE) datasets focus on scientific papers and overlook implementation-level details in code repositories. README files describe datasets, source code, and other implementation-level artifacts, however, their free-form Markdown offers little semantic structure, making automatic information extraction difficult. To address this gap, NERdME is introduced: 200 manually annotated README files with over νm10000 labeled spans and 10 entity types. Baseline results using large language models and fine-tuned transformers show clear differences between paper-level and implementation-level entities, indicating the value of extending SIE benchmarks with entity types available in README files. A downstream entity-linking experiment was conducted to demonstrate that entities derived from READMEs can support artifact discovery and metadata integration.
Author(s)
Gesese, Genet Asefa
Chen, Zongxiong
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Jiang, Shufan
Tan, Mary Ann
Liu, Zhaotai
Sack, Harald
Schimmler, Sonja  
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Mainwork
WWW '26: Proceedings of the ACM Web Conference 2026  
Conference
Web Conference 2026  
Open Access
File(s)
Download (975.02 KB)
Rights
CC BY 4.0: Creative Commons Attribution
DOI
10.1145/3774904.3792934
10.24406/publica-8727
Additional link
Full text
Language
English
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Keyword(s)
  • github readme files

  • named entity recognition

  • scholarly information extraction

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024