Extracting Interpretable Hierarchical Rules from Deep Neural Networks’ Latent Space

Wang, YaYaWangPaschke, AdrianAdrianPaschke2024-01-092024-01-092023-10-15https://publica.fraunhofer.de/handle/publica/45853610.1007/978-3-031-45072-3_17Deep neural networks, known for their superior learning capabilities, excel in identifying complex relationships between inputs and outputs, leveraging hierarchical, distributed data processing. Despite their impressive performance, these networks often resemble ’black boxes’ due to their highly intricate internal structure and representation, raising challenges in terms of safety, ethical standards, and social norms. Decompositional rule extraction techniques have sought to address these issues by delving into the latent space and retrieving a broad set of symbolic rules. However, the interpretability of these rules is often hampered by their size and complexity. In this paper, we introduce EDICT (Extracting Deep Interpretable Concepts using Trees), a novel approach for rule extraction which employs a hierarchy of decision trees to mine concepts learned in a neural network, thereby generating highly interpretable rules. Evaluations across multiple datasets reveal that our method extracts rules with greater speed and interpretability compared to existing decompositional rule extraction techniques. Simultaneously, our approach demonstrates competitive performance in classification accuracy and model fidelity.enNeural network interpretabilityRule-based explanationsDecompositional rule extractionExtracting Interpretable Hierarchical Rules from Deep Neural Networks’ Latent Spaceconference paper