Explainable Machine Learning Methods for Anomaly Detection in Data Quality Assurance

Widjaja, Raynard

2025

Master Thesis

Abstract

In the modern digital era, data is the one of the most valuable assets of an organization and has a tremendous impact on its long-term success and decision-making processes. Datadriven decision-making is at the center of modern enterprises and institutions. However, the data that organizations and institutions receive every day may be flawed. This could negatively effect the performance of these organizations and institutions. Data Quality Assurance (DQA) is a proven approach to guarantee the received data is correct, reliable, consistent, and complete. For financial data, DQA approach can detect the flaws in data, and mitigate potential problem and risk factors. The most effective way to do DQA is by utilizing machine learning approach for anomalies detection. However, the black-box nature of machine learning might hinder their adoption. Explainable machine learning (XAI) approach can be the solution for this. In this thesis, we leverage XAI for DQA and address the need for an interactive tool that allows users with non-technical backgrounds to easily identify anomalies in data, integrated with explanations from the machine learning model. While there are many works regarding anomaly detection in financial domain, there is a lack of tool specifically designed for DQA. In this thesis, this gap was addressed, by developing an interactive DQA tool, for user to efficiently find the anomaly in the data. The thesis includes an overview of related work, a comparison of various machine learning algorithms for anomaly detection, and the presentation of the interactive DQA tool. XAI was implemented and integrated into the tool. A qualitative evaluation was conducted with domain experts, who performed tasks using the tool and completed a questionnaire. Quantitative evaluations of the model’s performance and explanations were also carried out. While the autoencoder’s performance did not meet expectations, the isolation forest performed adequately. The tool proved effective in helping users find anomalies, and the study results indicated excellent usability. Several areas for improvement in the model and tool were identified, highlighting opportunities for future research.

Thesis Note

Darmstadt, TU, Master Thesis, 2025

Author(s)

Widjaja, Raynard