Interpreting Black-box Machine Learning Models for High Dimensional Datasets

Karim, Md Rezaul; Shajalal, Md; Gras, Alexander; Dohmen, Till; Chala, Sisay Adugna; Boden, Alexander; Beecks, Christian; Decker, Stefan

doi:10.1109/DSAA60987.2023.10302562

2023

Conference Paper

Abstract

Many datasets are of increasingly high dimension- ality, where a large number of features could be irrelevant to the learning task. The inclusion of such features would not only introduce unwanted noise but also increase computational complexity. Deep neural networks (DNNs) outperform machine learning (ML) algorithms in a variety of applications due to their effectiveness in modelling complex problems and handling high-dimensional datasets. However, due to non-linearity and higher-order feature interactions, DNN models are unavoidably opaque, making them black-box methods. In contrast, an interpretable model can identify statistically significant features and explain the way they affect the model's outcome. In this paper, we propose a novel method to improve the interpretability of blackbox models in the case of high-dimensional datasets. First, a black-box model is trained on full feature space that learns useful embeddings on which the classification is performed. To decompose the inner principles of the black-box and to identify top-k important features (global explainability), probing and perturbing techniques are applied. An interpretable surrogate model is then trained on top-k feature space to approximate the black-box. Finally, decision rules and counterfactuals are derived from the surrogate to provide local decisions. Our approach outperforms tabular learners, e.g., TabNet and XGboost, and SHAP-based interpretability techniques, when tested on a number of datasets having dimensionality between 54 and 20,531<sup>1</sup><sup>1</sup>GitHub: https://github.com/rezacsedu/DeepExplainHidim.