Options
2025
Conference Paper
Title
Advanced Post-processing for Object Detection Dataset Generation
Abstract
Fast acquisition and generation of training data is an important problem for the training of Deep Neural Networks (DNNs). Previous work using luminance keying for efficient training data acquisition achieves good results with very low effort, due to the high light absorption capabilities of the black background which enables keying through luminance instead of chroma. However, it does not reach real-world recording levels of performance. This paper significantly improves the achievable performance of luminance keying as evaluated for the use case of object detection. We introduce a novel post-processing pipeline incorporating a denoising diffusion probabilistic model (DDPM) to capitalize on luma-key recordings. First, we employ low rank adaptation (LoRA) to teach the recorded objects to a diffusion model. Second, we use Depth Anything to estimate the depth of the luma-key data. Third, utilizing ControlNet with depth estimates and Canny image filtering for guidance, we generate photo-realistic training images, using a wide range of relevant prompts, which increases model robustness in diverse environments. Applying the high quality masks of luminance keying, we get perfect ground truth for the object detection training. Extensive testing on the YCB-V object set demonstrates that our approach performs favorably compared to traditional techniques that require 3D meshes and material data, such as physically-based rendering or in-distribution dataset splits. Our proposed pipeline improves luminance keying to provide an efficient methodology for creating high-quality training datasets, facilitating the swift development and training of state-of-the-art DNNs for object detection, and is applicable to similar tasks, such as classification, and segmentation.
Keyword(s)
Branche: Automotive Industry
Branche: Healthcare
Branche: Cultural and Creative Economy
Research Line: Computer graphics (CG)
Research Line: Computer vision (CV)
Research Line: Machine learning (ML)
LTA: Scalable architectures for massive data sets
LTA: Machine intelligence, algorithms, and data structures (incl. semantics)
LTA: Generation, capture, processing, and output of images and 3D models
3D Computer vision
Machine learning
Pattern recognition