Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

A mighty dataset for stress-testing question answering systems

: Haarmann, Bastian; Martens, Claudio; Petzka, Henning; Napolitano, Giulio


Institute of Electrical and Electronics Engineers -IEEE-:
IEEE 12th International Conference on Semantic Computing, ICSC 2018 : Laguna Hills, California, USA, 31 January - 2 February 2018
Piscataway, NJ: IEEE, 2018
ISBN: 978-1-5386-4408-9 (electronic)
ISBN: 978-1-5386-4409-6 (print)
ISBN: 978-1-5386-4407-2
International Conference on Semantic Computing (ICSC) <12, 2018, Laguna Hills/Calif.>
Conference Paper
Fraunhofer IAIS ()

The general goal of semantic question answering systems is to provide correct answers to natural language queries, given a number of structured datasets. The increasing broad deployment of question answering (QA) systems in everyday life requires a comparable and reliable rating of how well QA systems perform and how scalable they are. In order to achieve this, we developed a massive dataset of more than 2 million natural language questions and their SPARQL queries for the DBpedia dataset. We combined natural language processing and linked open data to automatically generate this large amount of valid question-query pairs. Our aim is to assist the benchmarking or scoring of QA systems in terms of answering questions in a range of languages, retrieving answers from heterogeneous sources or answering massive amounts of questions within a limited time. This dataset represents an ideal choice for stress-testing systems' scalability, speed and correctness. As such it has already been included into the Large-scale QA task of the Question Answering Over Linked Data (QALD) Challenge and the HOBBIT project Question Answering Benchmark.