Options
2020
Conference Paper
Titel
Identification of Spurious Labels in Machine Learning Data Sets using N-Version Validation
Abstract
Machine learning components are becoming popular for the automotive industry. More and more data sets become available for training machine learning components. All of them provide ground truth labels for images. The labeling process is expensive and potentially error-prone. At the same time, label correctness defines the business value of a data set. In this paper, we use N-Version approach to assess the label quality in a data set. The approach combines N state-of-the-art neural networks and aggregates their results in a single verdict using majority voting. We analyze this majority vote against the ground truth label and compute the percentage of disagreeing pixels along with other metrics, enabling the automated and detailed analysis of label quality on data sets. We evaluate our methodology by classifying the BDD100K drivable area data set. The evaluation shows that the approach identifies misclassified scenes or inconsistencies between label semantics for similar scenes.