Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

A knowledge-extraction approach to identify and present verbatim quotes in free text

: Paaß, Gerhard; Bergholz, Andre; Pilz, Anja

Postprint urn:nbn:de:0011-n-2173417 (532 KByte PDF)
MD5 Fingerprint: 1a39e3d90b9330ed78ea4578e5b47930
© ACM 2012 This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
Created on: 6.11.2012

Lindstaedt, Stefanie ; Association for Computing Machinery -ACM-:
I-KNOW 2012, 12th International Conference on Knowledge Management and Knowledge Technologies. Proceedings : 5 to 7 September 2012, Graz, Austria
New York: ACM, 2012
ISBN: 978-1-4503-1242-4
Art. 31, 4 pp.
International Conference on Knowledge Management and Knowledge Technologies (I-KNOW) <12, 2012, Graz>
Conference Paper, Electronic Publication
Fraunhofer IAIS ()
artificial intelligence; natural language processing; content analysis and indexing; linguistic processing; relation extraction; quote extraction; text mining; information extraction application

In news stories verbatim quotes of persons play a very important role, as they carry reliable information about the opinion of that person concerning specific aspects. As thousands of new quotes are published every hour it is very dificult to keep track of them. In this paper we describe a set of algorithms to solve the knowledge management problem of identifying, storing and accessing verbatim quotes. We handle the verbatim quote task as a relation extraction problem from unstructured text. Using a workflow of knowledge extraction algorithms we provide the required features for the relation extraction algorithm. The central relation extraction procedures is trained using manually annotated documents. It turns out that structural grammatical information is able to improve the F-vale for ve rbatim quote detection to 84.1%, which is sufficient for many exploratory applications. We present the results in a smartphone app connected to a web server, which employs a number of algorithms like linkage to Wikipedia, topics extraction and search engine indices to provide a flexible access to the extracted verbatim quotes.