Options
2023
Book Article
Title
Explainable deep learning: concepts, methods, and new developments
Abstract
Explainable AI (XAI) is an emerging research field bringing transparency to highly complex and opaque machine learning (ML) models. In recent years, various techniques have been proposed to explain and understand ML models, which have been previously widely considered black boxes (e.g., deep neural networks), and verify their predictions. Surprisingly, the prediction strategies of these models at times turned out to be somehow flawed and not aligned with human intuition, e.g., due to biases or spurious correlations in the training data. Recent endeavors of the XAI community aim to move beyond the mere identification of these flawed behaviors, toward the integration of explanations into the training process in order to improve model efficiency, robustness, and generalization. This chapter introduces the reader to concepts, methods, and recent developments in the field of explainable AI. After discussing what it means to “explain” in the context of machine learning and presenting useful desiderata for explanations, we provide a brief overview over established explanation techniques, mainly focusing on the so-called attribution methods which explain the model's decisions by post hoc assigning importance scores to every input dimension (e.g., pixel). Furthermore, we discuss recent developments in XAI, in particular we present ways to use explanations to effectively prune, debug, and improve a given model, as well as the concept of neuralization which easily allows one to transfer XAI methods specifically developed for neural network classifiers to other types of models and tasks. The chapter concludes with a discussion of the limits of current explanation methods and promising future research directions.