• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. Mechanistic understanding and validation of large AI models with SemanticLens
 
  • Details
  • Full
Options
2025
Journal Article
Title

Mechanistic understanding and validation of large AI models with SemanticLens

Abstract
Unlike human-engineered systems, such as aeroplanes, for which the role and dependencies of each component are well understood, the inner workings of artificial intelligence models remain largely opaque, which hinders verifiability and undermines trust. Current approaches to neural network interpretability, including input attribution methods, probe-based analysis and activation visualization techniques, typically provide limited insights about the role of individual components or require extensive manual interpretation that cannot scale with model complexity. This paper introduces SemanticLens, a universal explanation method for neural networks that maps hidden knowledge encoded by components (for example, individual neurons) into the semantically structured, multimodal space of a foundation model such as CLIP. In this space, unique operations become possible, including (1) textual searches to identify neurons encoding specific concepts, (2) systematic analysis and comparison of model representations, (3) automated labelling of neurons and explanation of their functional roles, and (4) audits to validate decision-making against requirements. Fully scalable and operating without human input, SemanticLens is shown to be effective for debugging and validation, summarizing model knowledge, aligning reasoning with expectations (for example, adherence to the ABCDE rule in melanoma classification) and detecting components tied to spurious correlations and their associated training data. By enabling component-level understanding and validation, the proposed approach helps mitigate the opacity that limits confidence in artificial intelligence systems compared to traditional engineered systems, enabling more reliable deployment in critical applications.
Author(s)
Dreyer, Maximilian
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Berend, Jim
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Labarta, Tobias
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Vielhaben, Johanna
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Wiegand, Thomas  
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Lapuschkin, Sebastian Roland
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Samek, Wojciech  
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
Journal
Nature machine intelligence  
Open Access
File(s)
Download (8.83 MB)
Rights
CC BY 4.0: Creative Commons Attribution
DOI
10.1038/s42256-025-01084-w
10.24406/publica-5197
Additional link
Full text
Language
English
Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024