STYLE: Style Transfer for Synthetic Training of a YoLo6D Pose Estimator
STYLE: Style Transfer zum synthetischen Training eines YOLO6D Posendetektors
Supervised training of deep neural networks requires a large amount of training data. Since labeling is time-consuming and error prone and many applications lack data sets of adequate size, research soon became interested in generating this data synthetically, e.g. by rendering images, which makes the annotation free and allows utilizing other sources of available data, for example, CAD models. However, unless much effort is invested, synthetically generated data usually does not exhibit the exact same properties as real-word data. In context of images, there is a difference in the distribution of image features between synthetic and real imagery, a domain gap. This domain gap reduces the transfer-ability of synthetically trained models, hurting their real world inference performance. Current state-of-the-art approaches trying to mitigate this problem concentrate on domain randomization: Overwhelming the model's feature extractor with enough variation to force it to learn more meaningful features, effectively rendering real-world images nothing more but one additional variation. The main problem with most domain randomization approaches is that it requires the practitioner to decide on the amount of randomization required, a fact research calls "blind" randomization. Domain adaptation in contrast directly tackles the domain gap without the assistance of the practitioner, which makes this approach seem superior. This work deals with training of a DNN-based object pose estimator in three scenarios: First, a small amount of real-world images of the objects of interest is available, second, no images are available, but object specific texture is given, and third, no images and no textures are available. Instead of copying successful randomization techniques, these three problems are tackled mainly with domain adaptation techniques. The main proposition is the adaptation of general-purpose, widely-available, pixel-level style transfer to directly tackle the differences in features found in images from different domains. To that end several approaches are introduced and tested, corresponding to the three different scenarios. It is demonstrated that in scenario one and two, conventional conditional GANs can drastically reduce the domain gap, thereby improving performance by a large margin when compared to non-photo-realistic renderings. More importantly: ready-to-use style transfer solutions improve performance significantly when compared to a model trained with the same degree of randomization, even when there is no real-world data of the target objects available (scenario three), thereby reducing the reliance on domain randomization.
Darmstadt, TU, Master Thesis, 2020