• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Zero-Shot Open-Vocabulary OOD Object Detection and Grounding using Vision Language Models
 
  • Details
  • Full
Options
2025
Conference Paper
Title

Zero-Shot Open-Vocabulary OOD Object Detection and Grounding using Vision Language Models

Abstract
Automated driving involves complex perception tasks that require a precise understanding of diverse traffic scenarios and confident navigation. Traditional data-driven algorithms trained on closed-set data often fail to generalize upon out-of-distribution (OOD) and edge cases. Recently, Large Vision Language Models (LVLMs) have shown potential inintegrating the reasoning capabilities of language models to understand and reason about complex driving scenes, aiding generalization to OOD scenarios. However, grounding such OOD objects still remains a challenging task. In this work, we propose an automated framework zPROD for zero-shot promptable open vocabulary OOD object detection,segmentation, and grounding in autonomous driving. We leverage LVLMs with visual grounding capabilities, eliminating the need for lengthy textc ommunication and providing precise indications of OOD objects in the scene or on the track of the egocentric vehicle. We evaluate our approach on OOD datasets from existing road anomaly segmentation benchmarks such as SMIYC and Fishyscapes. Our zero-shot approach shows superior performance on RoadAnomaly and RoadObstacle and comparable results on the Fishyscapes subset as compared to supervised models and acts a baseline for future zero-shot methods based on open vocabulary OOD detection.
Author(s)
Sinhamahapatra, Poulami  
Fraunhofer-Institut für Kognitive Systeme IKS  
Bose, Shirsha
Fraunhofer-Institut für Kognitive Systeme IKS  
Roscher, Karsten  
Fraunhofer-Institut für Kognitive Systeme IKS  
Günnemann, Stephan
Technische Universität München  
Mainwork
6th Northern Lights Deep Learning Conference, NLDL 2025  
Project(s)
IKS-Ausbauprojekt  
safe.trAIn
Funder
Bayerisches Staatsministerium für Wirtschaft, Landesentwicklung und Energie  
Bundesministerium für Wirtschaft und Klimaschutz  
Conference
Northern Lights Deep Learning Conference 2025  
Open Access
File(s)
Download (7.68 MB)
Link
Link
Rights
CC BY 4.0: Creative Commons Attribution
DOI
10.24406/publica-4460
Language
English
Fraunhofer-Institut für Kognitive Systeme IKS  
Fraunhofer Group
Fraunhofer-Verbund IUK-Technologie  
Keyword(s)
  • out of distribution

  • OOD

  • object detection

  • OOD object detection

  • zero-shot

  • open vocabulary

  • segmentation

  • autonomous driving

  • vision language model

  • large vision language model

  • LVLM

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024