• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. FactRunner: A new system for NLP-based information extraction from wikipedia
 
  • Details
  • Full
Options
2014
Conference Paper
Title

FactRunner: A new system for NLP-based information extraction from wikipedia

Abstract
Wikipedia is playing an increasing role as a source of humanreadable knowledge, because it contains an enormous amount of high quality information written by human authors. Finding a relevant piece of information in this huge collection of natural language text is often a time-consuming process, as a keyword-based search interface is the main method for querying. Therefore, an iterative process to explore the document collection to find the information of interest is required. In this paper, we present an approach to extract structured information from unstructured documents to enable structured queries. Information Extraction (IE) systems have been proposed for this tasks, but due to the complexity of natural language, they often produce unsatisfying results. As Wikipedia contains, in addition to the plain natural language text, links between documents and other metadata, we propose an approach which exploits this information to extract more accurate structured information. Our proposed system FactRunner focusses on extracting structured information from sentences containing such links, because the links may indicate more accurate information than other sentences. We evaluated our system with a subset of documents from Wikipedia and compared the results with another existing system. The results show that a natural language parser combined with Wikipedia markup can be exploited for extracting facts in form of triple statements with a high accuracy.
Author(s)
Sutoyo, R.
Quix, C.
Kastrati, F.
Mainwork
Web information systems and technologies. 9th international conference, WEBIST 2013  
Conference
International Conference on Web Information Systems and Technologies (WEBIST) 2013  
DOI
10.1007/978-3-662-44300-2_14
Language
English
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024