• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. EdgeYOLO+Depth: Extending object detection for real-time depth estimation on smartphones
 
  • Details
  • Full
Options
August 7, 2025
Conference Paper
Title

EdgeYOLO+Depth: Extending object detection for real-time depth estimation on smartphones

Abstract
Monocular cameras are widely used in consumer devices, robotics and industrial systems. While object detection is well-established for identifying relevant objects, many applications require additional instance depth information. But depth estimation methods typically rely on complex models unsuitable for edge devices and real-time applications. We investigate lightweight approaches to add instance depth estimation to object detection systems without increasing computational requirements, optimized for real-time deployment on edge devices like smartphones. We improve the existing DisNet approach, which uses only bounding box dimensions as input and is compatible with any object detection model. Additionally, we extend an edge-optimized YOLO model with a depth estimation component by modifying only the model’s head (EdgeYOLO+Depth). Our approaches are evaluated on an internal dataset and the KITTI dataset. On KITTI, our DisNet-based method achieves a relative error of 6.55 %, while our EdgeYOLO+Depth approach reaches 3.59 %, outperforming comparable methods. We demonstrate that with proper tuning of the depth component’s weight in the loss function, the additional depth estimation has no negative impact on object detection results. Our experiments show robust model performance against minor input transformations (horizontal shifts: +0.24 pp relative error), and acceptable degradation under more substantial geometric changes (rotation: +3.86 pp relative error). Regarding inference times, our DisNet variant adds only 0.15 ms for depth estimation, while EdgeYOLO+Depth requires no additional inference time, delivering complete object detection with depth estimation at 52 FPS on a Samsung Galaxy S23. Our code is publicly available at: https://github.com/christoph-i/EdgeDepth.
Author(s)
Ilse, Christoph
Ketzler, Fabian
Pfeifer, Niko  
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Massow, Kay  
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Maiwald, Friedrich
Radusch, Ilja  
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Mainwork
International Conference on Artificial Intelligence, Computer, Data Sciences, and Applications, ACDSA 2025  
Conference
International Conference on Artificial Intelligence, Computer, Data Sciences, and Applications 2025  
DOI
10.1109/ACDSA65407.2025.11166075
Language
English
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Keyword(s)
  • object detection

  • depth estimation

  • monocular depth estimation

  • machine learning

  • edge computing

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024