• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Artikel
  4. JEMA: Joint Embedding of Multimodal and multi-view Alignment in human-centric embedding space for manufacturing
 
  • Details
  • Full
Options
May 2026
Journal Article
Title

JEMA: Joint Embedding of Multimodal and multi-view Alignment in human-centric embedding space for manufacturing

Abstract
This work introduces JEMA (Joint Embedding with Multimodal and multi-view Alignment), a novel co-learning framework and loss function to combine multiple sensors and process parameters in Directed Energy Deposition (DED), a critical process in metal additive manufacturing. As Industry 5.0 advances in industrial applications, effective process monitoring becomes increasingly essential. However, the limited availability of data and the black-box nature of AI solutions present significant implementation challenges in industrial settings. JEMA addresses these limitations by leveraging multimodal data, including multi-view images and process parameters, to learn transferable semantic representations. By implementing a supervised regression contrastive loss function, JEMA shapes the embedding space to enable interpretable inference. Furthermore, the framework allows for simplified hardware requirements and reduced computational overhead during deployment by utilizing only the primary on-axis sensor. We evaluate the effectiveness of JEMA loss in DED process monitoring, with particular focus on its generalization capabilities for downstream tasks such as melt pool geometry prediction without extensive fine-tuning. Our empirical results demonstrate the effectiveness of JEMA, showing improvements of 29% and 20% in multimodal and unimodal settings, respectively, compared to models without any regularization loss. Additionally, JEMA outperforms supervised contrastive learning methods by 8% and 2% in the same settings. These improvements are also accompanied by a more structured and meaningful representation in the embedding space. Importantly, the learned embedding representation provides direct interpretability of the feature space, which can be utilized by both human operators and automated systems for process optimization, control, and anomaly detection based on defined thresholds. This human-centered approach ensures that operators can actively engage with the system, making informed decisions and enhancing their trust in the process. Our framework establishes a foundation for integrating multisensor data with metadata, enabling diverse downstream applications both within manufacturing processes and beyond, while keeping human expertise central to the loop.
Author(s)
Sousa, João
University of Porto, Faculty of Engineering -FEUP-  
Darabi, Roya
University of Porto, Faculty of Engineering -FEUP-  
Sousa, Armando
University of Porto, Faculty of Engineering -FEUP-  
Brückner, Frank  
Fraunhofer-Institut für Werkstoff- und Strahltechnik IWS  
Reis, Luís Paulo
University of Porto, Faculty of Engineering -FEUP-  
Reis, Ana
University of Porto, Faculty of Engineering -FEUP-  
Journal
Computer vision and image understanding : CVIU  
DOI
10.1016/j.cviu.2026.104771
Language
English
Fraunhofer-Institut für Werkstoff- und Strahltechnik IWS  
Keyword(s)
  • Artificial intelligence

  • Transference

  • Embedding representation

  • Contrastive learning

  • Human-in-the-loop

  • Additive manufacturing

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024