Options
November 13, 2024
Master Thesis
Title
Contrastive Attribution Learning in Explainable Neural Text Classification
Abstract
With the increasing use of machine learning methods in critical domains, such as healthcare, law, and security, the need to unravel the black-box nature of these models, typically deep neural networks, has seen a substantial increase recently. Several methods have been introduced to provide explanations for given decisions and predictions. For example, widely used methods assign attribution scores to the most influential input features. An explanation rationale is then given by the highest attributed features. However, these explanations may be ambiguous and can therefore be plausible for alternative decisions. In this thesis, a contrastive learning approach is conducted to enable the differentiation of explanations across similar prediction possibilities in the domain of Natural Language Processing. The proposed method is based on a novel approach from robust machine learning to align explanations of similar instances, incorporated into a triplet learning scheme. Thereby, the counterexamples in the triplet are artificially constructed via perturbation in the embedding space. This is thought to preserve the syntactic structure of the input, which allows a direct comparison of the attribution scores. Results show that while the generation of the examples led to out-of-distribution data, the attributions between some class comparisons indeed revealed less shared features as explanation.
Thesis Note
Tübingen, Univ., Master Thesis, 2024
Author(s)
Advisor(s)
File(s)
Rights
Use according to copyright law
Language
English