Multi-View Pose Estimation Using 2D-3D Correspondences

Li, Jiayin

2024

Master Thesis

Abstract

6D pose estimation is estimating the translation and rotation of an object in 3 dimensions [33] from RGB or RGBD images, it is a critical component in advancing robotics, augmented reality, and automated systems [19, 57, 52]. The inherent difficulty of this task lies in achieving high precision under various challenging conditions, such as occlusions, changes in lighting, and texture-less objects, areas where traditional single-view methods often fall short [38, 24, 56]. Cutting-edge techniques rely on deep learning methods to enhance estimation accuracy. Among these, approaches leverage convolutional neural networks (CNNs) based backbone to learn 2D-3D correspondence have garnered significant attention [48, 50, 45]. However, many of these methods encounter limitations: they cannot be trained end-to-end, lack effective guidance for learning 2D-3D correspondences and often require training separate models for individual objects to achieve state-of-the-art (SotA) performance. These challenges arise from the non-differentiable nature of traditional PnP solving and the inherent capacity limitations of models. Our approach introduces a modification to the SotA method GDRNPP [45]. The objective of this modification is to unlock the full potential of end-to-end training, guiding models to learn better 2D-3D correspondences. Subsequently, we perform multi-view pose optimization based on the single-view result, aiming to overcome limitations imposed by the model’s capacity and achieve superior performance. The source code is available here: Multi-View-PE-2023.

Thesis Note

Darmstadt, TU, Master Thesis, 2024

Author(s)

Li, Jiayin

Fraunhofer-Institut für Graphische Datenverarbeitung IGD

Advisor(s)

Kuijper, Arjan