• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. A New Aligned Simple German Corpus
 
  • Details
  • Full
Options
July 2023
Conference Paper
Title

A New Aligned Simple German Corpus

Abstract
"Leichte Sprache", the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German - German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by the F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license.
Author(s)
Toborek, Vanessa
Universität Bonn  
Busch, Moritz
Universität Bonn  
Boßert, Malte
Universität Bonn  
Bauckhage, Christian  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Welke, Pascal
Universität Bonn  
Mainwork
ACL 2023, 61st Conference of the the Association for Computational Linguistics. Proceedings. Vol.1: Long Papers  
Project(s)
The Lamarr Institute for Machine Learning and Artificial Intelligence  
Funder
Bundesministerium für Bildung und Forschung  
Conference
Association for Computational Linguistics (ACL Annual Meeting) 2023  
Open Access
DOI
10.18653/v1/2023.acl-long.638
Additional link
Full text
Language
English
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Keyword(s)
  • Alignment methods

  • F1 scores

  • Multiple documents

  • Sentence alignment

  • Simple++

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024