Noise Reduction in Distant Supervision using Probabilistic Soft Logic
The traditional supervised relation extraction systems mostly rely on manually labeled training datasets. Generating these datasets often is a very expensive and time-consuming process. Distant supervision automates the generation of training dataset by matching the contents of a knowledge database to corresponding text. However, even if distant supervision is to be shown as an efficient method, it suffers from producing noisy training samples, which may have a negative influence on the performance of relation extraction systems. This thesis focuses on reducing this noise in training datasets generated by distant supervision in order to improve the performance of the final extraction model. To target this problem, a probabilistic model is proposed that jointly reasons over relation candidates and entity types. This model is developed using Probabilistic Soft Logic Framework, which enables to intuitively transform domain knowledge into first-order logic form. It allows to infer labels for each candidate relation using a probabilistic reasoner. The proposed PSL model is analyzed and compared against two baselines: a brute-force approach with hard-defined rules and a Markov Logic Network model. The experimental results show that the proposed method outperforms the two other baselines. Moreover, the proposed approach not only improves the quality of the training dataset generated by distant supervision, but also the performance of the state-of-the-art relation extraction model.
Bonn, Univ., Master Thesis, 2019