Options
September 25, 2024
Conference Paper
Title
Image Dataset Quality Assessment Through Descriptive Out-of-Distribution Detection
Abstract
Out-of-distribution detection ensures trustworthiness in machine learning systems by detecting anomalous data points and adjusting confidence in predictions accordingly. However, another key use-case of out-of-distribution detection is the assessment of data quality with respect to a desired distribution or semantic range of data. This work proposes a simple but powerful approach that allows for cleaning of image data based on descriptively defining desired data as well as undesired data. Notably, this method does not require the training of a machine learning model. In addition, this work presents a new image dataset suited for evaluating data cleaning tasks in a way that has practical relevance, and demonstrates satisfactory experimental results.