Towards Contradiction Detection in German: A Translation-Driven Approach
With the recent advancements in Machine Learning based Natural Language Processing (NLP), language dependency has always been a limiting factor for a majority of NLP applications. Typically, models are trained for the English language due to the availability of very large labeled and unlabeled datasets, which also allow to fine tune models for that language. Contradiction Detection is one such problem that has found many practical applications in NLP and up to this point has only been studied in the context of English language. The scope of this paper is to examine a set of baseline methods for the Contradiction Detection task on German text. For this purpose, the well-known Stanford Natural Language Inference (SNLI) data set (110,000 sentence pairs) is machine-translated from English to German. We train and evaluate four classifiers on both the original and the translated data, using state-of-the-art textual data representations. Our main contribution is the first large-scale assessment for this problem in German, and a validation of machine translation as a data generation method. We also present a novel approach to learn sentence embeddings by exploiting the hidden states of an encoder-decoder Sequence-To-Sequence RNN trained for autoencoding or translation.