Options
2024
Master Thesis
Title
Multi-View Pose Estimation Using 2D-3D Correspondences
Abstract
6D pose estimation is estimating the translation and rotation of an object in 3 dimensions [33] from RGB or RGBD images, it is a critical component in advancing robotics, augmented reality, and automated systems [19, 57, 52]. The inherent difficulty of this task lies in achieving high precision under various challenging conditions, such as occlusions, changes in lighting, and texture-less objects, areas where traditional single-view methods often fall short [38, 24, 56]. Cutting-edge techniques rely on deep learning methods to enhance estimation accuracy. Among these, approaches leverage convolutional neural networks (CNNs) based backbone to learn 2D-3D correspondence have garnered significant attention [48, 50, 45]. However, many of these methods encounter limitations: they cannot be trained end-to-end, lack effective guidance for learning 2D-3D correspondences and often require training separate models for individual objects to achieve state-of-the-art (SotA) performance. These challenges arise from the non-differentiable nature of traditional PnP solving and the inherent capacity limitations of models. Our approach introduces a modification to the SotA method GDRNPP [45]. The objective of this modification is to unlock the full potential of end-to-end training, guiding models to learn better 2D-3D correspondences. Subsequently, we perform multi-view pose optimization based on the single-view result, aiming to overcome limitations imposed by the model’s capacity and achieve superior performance. The source code is available here: Multi-View-PE-2023.
Thesis Note
Darmstadt, TU, Master Thesis, 2024
Language
English
Keyword(s)
Branche: Automotive Industry
Branche: Healthcare
Branche: Cultural and Creative Economy
Research Line: Computer graphics (CG)
Research Line: Computer vision (CV)
Research Line: Human computer interaction (HCI)
Research Line: Machine learning (ML)
LTA: Interactive decision-making support and assistance systems
LTA: Machine intelligence, algorithms, and data structures (incl. semantics)
3D Computer vision
Machine learning
Pattern recognition
3D Object localisation