• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. Post-hoc Concept Disentanglement: From Correlated to Isolated Concept Representations
 
  • Details
  • Full
Options
2026
Conference Paper
Title

Post-hoc Concept Disentanglement: From Correlated to Isolated Concept Representations

Abstract
Concept Activation Vectors (CAVs) are widely used to model human-understandable concepts as directions within the latent space of neural networks. They are trained by identifying directions from the activations of concept samples to those of non-concept samples. However, this method often produces similar, non-orthogonal directions for correlated concepts, such as "beard" and "necktie" within the CelebA dataset, which frequently co-occur in images of men. This entanglement complicates the interpretation of concepts in isolation and can lead to undesired effects in CAV applications, such as activation steering. To address this issue, we introduce a post-hoc concept disentanglement method that employs a non-orthogonality loss, facilitating the identification of orthogonal concept directions while preserving directional correctness. We evaluate our approach with real-world and controlled correlated concepts in CelebA and a synthetic FunnyBirds dataset with VGG16 and ResNet18 architectures. We further demonstrate the superiority of orthogonalized concept representations in activation steering tasks, allowing (1) the insertion of isolated concepts into input images through generative models and (2) the removal of concepts for effective shortcut suppression with reduced impact on correlated concepts in comparison to baseline CAVs. (Code is available at https://github.com/erenerogullari/cav-disentanglement.)
Author(s)
Erogullari, Eren
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Lapuschkin, Sebastian Roland
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Samek, Wojciech  
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Pahde, Frederik
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Mainwork
Explainable Artificial Intelligence. Third World Conference, xAI 2025. Proceedings. Part I  
Conference
World Conference on eXplainable Artificial Intelligence 2025  
Open Access
DOI
10.1007/978-3-032-08317-3_4
Additional link
Full text
Language
English
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024