Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Noise Reduction in Distant Supervision for Relation Extraction Using Probabilistic Soft Logic

 
: Kirsch, Birgit; Niyazova, Zamira; Mock, Michael; Rüping, Stefan

:

Cellier, P.:
Machine Learning and Knowledge Discovery in Databases. Proceedings. Pt.II : International Workshops of ECML PKDD 2019, Würzburg, Germany, September 16-20, 2019
Cham: Springer Nature, 2020 (Communications in computer and information science 1168)
ISBN: 978-3-030-43886-9 (Print)
ISBN: 978-3-030-43887-6 (Online)
ISBN: 978-3-030-43888-3
pp.63-78
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) <2019, Würzburg>
Bundesministerium für Bildung und Forschung BMBF (Deutschland)
01S18038B; ML2R
English
Conference Paper
Fraunhofer IAIS ()
Probabilistic Soft Logic; Statistical Relational Learning; Distant Supervision; Relation Extraction; Natural Language Processing

Abstract
The performance of modern relation extraction systems is to a great degree dependent on the size and quality of the underlying training corpus and in particular on the labels. Since generating these labels by human annotators is expensive, Distant Supervision has been proposed to automatically align entities in a knowledge base with a text corpus to generate annotations. However, this approach suffers from introducing noise, which negatively affects the performance of relation extraction systems. To tackle this problem, we propose a probabilistic graphical model which simultaneously incorporates different sources of knowledge such as domain experts knowledge about the context and linguistic knowledge about the sentence structure in a principled way. The model is defined using the declarati ve language provided by Probabilistic Soft Logic. Experimental results show that the proposed approach, compared to the original distantly supervised set, not only improves the quality of such generated training data sets, but also the performance of the final relation extraction model. The performance of modern relation extraction systems is to a great degree dependent on the size and quality of the underlying training corpus and in particular on the labels. Since generating these labels by human annotators is expensive, Distant Supervision has been proposed to automatically align entities in a knowledge base with a text corpus to generate annotations. However, this approach suffers from introducing noise, which negatively affects the performance of relation extraction systems. To tackle this problem, we propose a probabilistic graphical model which simultaneously incorporates different sources of knowledge such as domain experts knowledge about the context and linguistic knowledge about the sentence structure in a principled way. The model is defined using the declarati ve language provided by Probabilistic Soft Logic. Experimental results show that the proposed approach, compared to the original distantly supervised set, not only improves the quality of such generated training data sets, but also the performance of the final relation extraction model.

: http://publica.fraunhofer.de/documents/N-593470.html