Options
2024
Master Thesis
Title
Self-Optimizing Augmentation Pipeline
Abstract
Training effectiveness of deep neural network models is crucial for their success [53]. In addition to the training effectiveness, the inference time, which describes the time required to generate results, also plays an important role as it determines the applicability. Deep neural networks that suffer from inefficient training or long inference times could be unusable for many applications. Performance in terms of training time or inference time is an important factor in many fields of machine learning applications. Especially applications in real scenarios, such as in industrial environments, require a short inference time in order to be competitive in real time. Similarly, the rapid increase in model sizes in recent years has led to a growing interest in acceleration techniques for training. In particular, the emergence of very large deep networks such as LLMs increases the importance of being able to perform training in a reasonable amount of time, as these can require up to hundreds of GPU-years of training. To contribute to the latter problem, I propose a new training technique that aims to optimize the training effectiveness of neural networks in the field of computer vision by improving the performance. I present the self-optimizing augmentation pipeline SOAP, applied and investigated in the field of 6D object pose estimation, more precisely on the 6D object pose estimator GDRNPP.
SOAP requires a differentiable image generation process. For this purpose, the training process is analyzed separately with two different image generation models, PyTorch3D and Stable Diffusion. The proposed pipeline is evaluated on the LM-O, T-LESS and ITODD datasets, a subset of the seven core datasets of the BOP challenge, focusing on the task of model-based 6D localization of seen objects. To be benchmarkable, I present an evaluation that is comparable to the corresponding method in the BOP Leaderboard [4].
SOAP requires a differentiable image generation process. For this purpose, the training process is analyzed separately with two different image generation models, PyTorch3D and Stable Diffusion. The proposed pipeline is evaluated on the LM-O, T-LESS and ITODD datasets, a subset of the seven core datasets of the BOP challenge, focusing on the task of model-based 6D localization of seen objects. To be benchmarkable, I present an evaluation that is comparable to the corresponding method in the BOP Leaderboard [4].
Thesis Note
Darmstadt, TU, Master Thesis, 2024
Language
English
Keyword(s)
Branche: Automotive
Branche: Healthcare
Branche: Information Technology
Branche: Maritime Economy
Branche: Cultural and Creative Economy
Research Line: Computer graphics (CG)
Research Line: Computer vision (CV)
Research Line: Machine learning (ML)
LTA: Scalable architectures for massive data sets
LTA: Machine intelligence, algorithms, and data structures (incl. semantics)
3D Computer vision
Deep learning
3D Pattern/Structure recognition
3D Object localisation