LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia

Dubey, Mohnish; Banerjee, Debayan; Abdelkawi, Abdelrahman; Lehmann, Jens

doi:10.1007/978-3-030-30796-7_5

2019

Conference Paper

Abstract

Providing machines with the capability of exploring knowledge graphs and answering natural language questions has been an active area of research over the past decade. In this direction translating natural language questions to formal queries has been one of the key approaches. To advance the research area, several datasets like WebQuestions, QALD and LCQuAD have been published in the past. The biggest data set available for complex questions (LCQuAD) over knowledge graphs contains five thousand questions. We now provide LC-QuAD 2.0 (Large-Scale Complex Question Answering Dataset) with 30,000 questions, their paraphrases and their corresponding SPARQL queries. LC-QuAD 2.0 is compatible with both Wikidata and DBpedia 2018 knowledge graphs. In this article, we explain how the dataset was created and the variety of questions available with examples. We further provide a statistical analysis of the dataset.

Author(s)

Dubey, Mohnish

Banerjee, Debayan

Abdelkawi, Abdelrahman

Lehmann, Jens

Mainwork

The Semantic Web - ISWC 2019. 18th International Semantic Web Conference. Proceedings. Pt.II

Conference

International Semantic Web Conference (ISWC) 2019

Options

LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia