Forcing Interpretability for Deep Neural Networks through Rule-based Regularization
Remarkable progress in the field of machine learning strongly drives the research in many application domains. For some domains, it is mandatory that the output of machine learning algorithms needs to be interpretable. In this paper, we propose a rule-based regularization technique to enforce interpretability for neural networks (NN). For this purpose, we train a rule-based surrogate model simultaneously with the NN. From the surrogate, a metric quantifying its degree of explainability is derived and fed back to the training of the NN as a regularization term. We evaluate our model on four datasets and compare it to unregularized models as well as a decision tree (DT) based baseline. The rule-based regularization approach achieves interpretability and competitive accuracy.