Learned Data Augmentation for Model Optimization
This thesis explores the effectiveness of shape variation in the context of occlusion type augmentations by utilizing a shape generation policy, called Shapeshifting, for the construction of different occlusion shapes. In this approach, a reference point is randomly initialized within the image and several polygon vertices are then randomly placed around the reference point for the construction of the occlusion area. Further improvements, by applying the proposed approach multiple times in unstructured and structured ways, in what is referred to as K-Shapeshifting and Structured Shapeshifting, were also explored. This thesis also explores a segmentation-based occlusion approach, called Semantic Occlusion. The proposed approach constructs an occlusion mask using the regions inferred through an unsupervised semantic segmentation approach. The occlusion mask is constructed by selecting an arbitary region from the inferred segmentation. This approach was extended by further evaluating the performance of occluding multiple regions from the semantic segmentation. The proposed approaches were evaluated using the widelyused augmentation policy of random crop alongside with random flip as baseline. On the benchmark dataset CIFAR10, ResNet18 with Shapeshifting achieved an accuracy of 0.9574, an improvement over the baseline accuracy of 0.9528. On SVHN, the multi-component variant of Shapeshifting, K-Shapeshifting, achieved an accuracy of 0.9731, an improvement over the baseline accuracy of 0.9631. On STL10, the same policy, K-Shapeshifting, achieved 0.9729 on STL10 over the baseline accuracy of 0.9704. The proposed approach, Shapeshifting, that uses polygon generation algorithm for the construction of the occlusion mask achieved competitive performances. While it is shown that adding more polygon vertices for the construction of the occlusion polygon contributes to a significant improvment in performance in the proposed setting, the experiment results also provide empirical evidence that the improvement in performance is mainly attributed to the underlying increase in occlusion ratio. As such, it is concluded with empirical evidence that shape variation of the occlusion mask does not provide significant contribution to the model accuracy. The proposed Semantic Occlusion approach, which uses a single region of the inferred segmentation for the construction of the occlusion mask achieved 0.9444 on CIFAR10, 0.9649 on SVHN and 0.9637 on STL10. Improvements were achieved in the extension, K-Semantic Occlusion, with 0.9502 on CIFAR10, 0.9678 on SVHN and 0.967 on STL10. The proposed segmentation-based approach only achieved improvements over the baseline on SVHN and did not outperform the previous approach, Shapeshifting.
Darmstadt, TU, Master Thesis, 2020