Improving Zero-Shot Template-Based 6D Pose Estimation with Geometric Features

Pöllabauer, ThomasThomasPöllabauerWeyel, Johannes HarroJohannes HarroWeyelKnauthe, VolkerVolkerKnautheBerkei, SarahSarahBerkeiKuijper, ArjanArjanKuijper2025-01-282025-01-282025https://publica.fraunhofer.de/handle/publica/48302310.1007/978-3-031-77392-1_46D Object Pose Estimation is a fundamental problem in robotics and augmented reality. Most of today’s state-of-the-art approaches rely on deep learning and require large sets of training images depicting the target objects. A growing number of algorithms try to generalize from a set of known objects, available for training, to unseen objects at test time. Among those, GigaPose is a template-based approach, that renders the target object in an onboarding phase shortly before inference and uses learned latent codes of these renderings and observed objects for feature matching. While learned representation prove powerful in a wide range of tasks, we propose the integration of additional purely geometric features, which can be extracted basically for free from the available 3D meshes during the onboarding phase. This representation is then used as an additional input for template- and 2D-2D correspondence matching in our approach. We consider multiple relevant features and, implementing one of them, demonstrate improved performance on the core datasets of the relevant BOP Challenge. Our results suggest that, indeed, utilizing additional geometric features can improve the relevant metrics without much additional cost.enBranche: Automotive IndustryBranche: HealthcareBranche: Information TechnologyBranche: Cultural and Creative EconomyResearch Line: Computer vision (CV)Research Line: Machine learning (ML)LTA: Scalable architectures for massive data setsLTA: Machine intelligence, algorithms, and data structures (incl. semantics)3D Computer visionMachine learningPattern recognition3D Object localisationImproving Zero-Shot Template-Based 6D Pose Estimation with Geometric Featuresconference paper