Self-Optimizing Augmentation Pipeline

Kuijper, ArjanPöllabauer, ThomasBoller, AndreAndreBoller2025-01-162025-01-162024https://publica.fraunhofer.de/handle/publica/481390Training effectiveness of deep neural network models is crucial for their success [53]. In addition to the training effectiveness, the inference time, which describes the time required to generate results, also plays an important role as it determines the applicability. Deep neural networks that suffer from inefficient training or long inference times could be unusable for many applications. Performance in terms of training time or inference time is an important factor in many fields of machine learning applications. Especially applications in real scenarios, such as in industrial environments, require a short inference time in order to be competitive in real time. Similarly, the rapid increase in model sizes in recent years has led to a growing interest in acceleration techniques for training. In particular, the emergence of very large deep networks such as LLMs increases the importance of being able to perform training in a reasonable amount of time, as these can require up to hundreds of GPU-years of training. To contribute to the latter problem, I propose a new training technique that aims to optimize the training effectiveness of neural networks in the field of computer vision by improving the performance. I present the self-optimizing augmentation pipeline SOAP, applied and investigated in the field of 6D object pose estimation, more precisely on the 6D object pose estimator GDRNPP. SOAP requires a differentiable image generation process. For this purpose, the training process is analyzed separately with two different image generation models, PyTorch3D and Stable Diffusion. The proposed pipeline is evaluated on the LM-O, T-LESS and ITODD datasets, a subset of the seven core datasets of the BOP challenge, focusing on the task of model-based 6D localization of seen objects. To be benchmarkable, I present an evaluation that is comparable to the corresponding method in the BOP Leaderboard [4].enBranche: AutomotiveBranche: HealthcareBranche: Information TechnologyBranche: Maritime EconomyBranche: Cultural and Creative EconomyResearch Line: Computer graphics (CG)Research Line: Computer vision (CV)Research Line: Machine learning (ML)LTA: Scalable architectures for massive data setsLTA: Machine intelligence, algorithms, and data structures (incl. semantics)3D Computer visionDeep learning3D Pattern/Structure recognition3D Object localisationSelf-Optimizing Augmentation Pipelinemaster thesis