Self-Supervised Generative-Contrastive Learning of Multi-Modal Euclidean Input for 3D Shape Latent Representations: A Dynamic Switching Approach

Wu, Chengzhi; Pfrommer, Julius; Zhou, Mingyuan; Beyerer, Jürgen

doi:10.1109/tmm.2023.3338079

2024

Journal Article

Abstract

We propose a combined generative and contrastive neural architecture for learning latent representations of 3D volumetric shapes. The architecture uses two encoder branches for voxel grids and multi-view images from the same underlying shape. The main idea is to combine a contrastive loss between the resulting latent representations with an additional reconstruction loss. That helps to avoid collapsing the latent representations as a trivial solution for minimizing the contrastive loss. A novel dynamic switching approach is used to cross-train two encoders with a shared decoder. The switching approach also enables the stop gradient operation on a random branch. Further classification experiments show that the latent representations learned with our self-supervised method integrate more useful information from the additional input data implicitly, thus leading to better reconstruction and classification performance.

Author(s)

Wu, Chengzhi

Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB

Pfrommer, Julius

Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB

Zhou, Mingyuan

Beyerer, Jürgen

Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB

Journal

IEEE transactions on multimedia

Options

Self-Supervised Generative-Contrastive Learning of Multi-Modal Euclidean Input for 3D Shape Latent Representations: A Dynamic Switching Approach