Improving Language Model Performance by Training on Prototypical Contradictions

Pielka, Maren; Freischlad, Marie-Christin; Schmidt, Svetlana; Sifa, Rafet

doi:10.1007/978-3-031-88714-7_12

2025

Conference Paper

Abstract

We present an informed approach to augment existing contradiction detection datasets with prototypical examples for language model training. The samples are created by combining linguistic knowledge with the generative capabilities of current large language models. Specifically, we investigate three approaches that employ rule-based augmentation, data generation using GPT models and few-shot-prompting, as well as a combination of both. We find that adding prototypical samples to the training helps to significantly reduce the training set size, while maintaining or even improving performance on the downstream task.