Entity resolution by kernel methods

Pilz, Anja; Molzberger, L.; Paaß, Gerhard

2009

Conference Paper

Abstract

An important problem in text mining and semantic retrieval is entity resolution which aims at detecting the identity of a named entity. Note that the name of a unique entity may be written in variant ways and different unique entities may have the same name. The term "bush" for instance may refer to a woody plant, a mechanical fixing, 52 persons and 8 places covered in Wikipedia and thousands of other persons. For the first time, according to our knowledge we apply a kernel entity resolution approach to the German Wikipedia as reference for named entities. We describe the context of named entities in Wikipedia and the context of a detected name phrase in a new document by a context vector of relevant features. These contain not only the name itself and variant writings, but also relevant key terms, other identified named entities as well as topic indicators generated by an LDA topic model. We formulate different kernels for comparing these context vectors and use kernel classifiers, e.g. rank classifiers, to determine the right match. In comparison to a baseline approach using only text similarity the addition of topics approach gives a much higher f-value, which is comparable to the results published for English. It turns out that the procedure also is able to detect with high reliability if a person is not covered by the Wikipedia.

Author(s)

Pilz, Anja

Molzberger, L.

Paaß, Gerhard

Hauptwerk

Text Mining Services - Building and applying text mining based service infrastructures in research and industry

Konferenz

Conference on Text Mining Services (TMS) 2009

Options

Entity resolution by kernel methods