Kuijper, ArjanPöllabauer, ThomasLi, JiayinJiayinLi2024-11-272024-11-272024https://publica.fraunhofer.de/handle/publica/4793486D pose estimation is estimating the translation and rotation of an object in 3 dimensions [33] from RGB or RGBD images, it is a critical component in advancing robotics, augmented reality, and automated systems [19, 57, 52]. The inherent difficulty of this task lies in achieving high precision under various challenging conditions, such as occlusions, changes in lighting, and texture-less objects, areas where traditional single-view methods often fall short [38, 24, 56]. Cutting-edge techniques rely on deep learning methods to enhance estimation accuracy. Among these, approaches leverage convolutional neural networks (CNNs) based backbone to learn 2D-3D correspondence have garnered significant attention [48, 50, 45]. However, many of these methods encounter limitations: they cannot be trained end-to-end, lack effective guidance for learning 2D-3D correspondences and often require training separate models for individual objects to achieve state-of-the-art (SotA) performance. These challenges arise from the non-differentiable nature of traditional PnP solving and the inherent capacity limitations of models. Our approach introduces a modification to the SotA method GDRNPP [45]. The objective of this modification is to unlock the full potential of end-to-end training, guiding models to learn better 2D-3D correspondences. Subsequently, we perform multi-view pose optimization based on the single-view result, aiming to overcome limitations imposed by the model’s capacity and achieve superior performance. The source code is available here: Multi-View-PE-2023.enBranche: Automotive IndustryBranche: HealthcareBranche: Cultural and Creative EconomyResearch Line: Computer graphics (CG)Research Line: Computer vision (CV)Research Line: Human computer interaction (HCI)Research Line: Machine learning (ML)LTA: Interactive decision-making support and assistance systemsLTA: Machine intelligence, algorithms, and data structures (incl. semantics)3D Computer visionMachine learningPattern recognition3D Object localisationMulti-View Pose Estimation Using 2D-3D Correspondencesmaster thesis