Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Entity recognition in information extraction

: Hanafiah, Novita; Quix, Christoph


Nguyen, N.T. ; King Mongkut's Institute of Technology, Bangkok:
Intelligent information and database systems. 6th Asian conference, ACIIDS 2014. Vol.1 : Bangkok, Thailand, April 7-9, 2014; Proceedings
Cham: Springer International Publishing, 2014 (Lecture Notes in Computer Science 8397)
ISBN: 978-3-319-05475-9 (Print)
ISBN: 978-3-319-05476-6 (Online)
Asian Conference "Intelligent Information and Database Systems" (ACIIDS) <6, 2013, Bangkok>
Fraunhofer FIT ()

Detecting and resolving entities is an important step in information retrieval applications. Humans are able to recognize entities by context, but information extraction systems (IES) need to apply sophisticated algorithms to recognize an entity. The development and implementation of an entity recognition algorithm is described in this paper. The implemented system is integrated with an IES that derives triples from unstructured text. By doing so, the triples are more valuable in query answering because they refer to identified entities. By extracting the information from Wikipedia encyclopedia, a dictionary of entities and their contexts is built. The entity recognition computes a score for context similarity which is based on cosine similarity with a tf-idf weighting scheme and the string similarity. The implemented system shows a good accuracy on Wikipedia articles, is domain independent, and recognizes entities of arbitrary types.