Deep Learning based Vehicle Detection in Aerial Imagery
The usage of airborne platforms, such as unmanned aerial vehicles (UAVs), equipped with camera sensors is essential for a wide range of applications in the field of civil safety and security. Amongst others, prominent applications include surveillance and reconnaissance, traffic monitoring, search and rescue, disaster relief and environmental monitoring. However, analyzing the aerial imagery data solely by human operators is often not practicable due to the large amount of visual data and the resulting cognitive overload. In practice, automated processing chains based on appropriate computer vision algorithms are employed to assist human operators in assessing the aerial imagery data. Key component of such processing chains is an accurate detection of all relevant objects inside the camera's field of view, before the scene can be analyzed and interpreted. The low spatial resolution originating from the large distance between camera and ground makes object detection in aerial imagery a challenging task, which is further impeded by motion blur, occlusions or shadows. Although many conventional approaches for object detection in aerial imagery exist in the literature, the limited representation capacity of the utilized handcrafted features often inhibits reliable detection accuracies due to the occurring high variance in object scale, orientation, color, and shape. In the scope of this thesis, a novel deep learning based detection approach is developed, whereby the focus lies on vehicle detection in aerial imagery recorded in top view. For this purpose, Faster R-CNN is chosen as base detection framework because of its superior detection accuracy compared to other deep learning based detectors. Relevant adaptations to account for the specific characteristics of aerial imagery, especially the small object dimensions, are systematically examined and resulting issues with respect to real-world applications, i.e., the high number of false detections caused by vehicle-like structures and the poor inference time, are identified. Two novel components have been proposed to improve the detection accuracy by enhancing the contextual content of the employed feature representation. The first component aims at increasing spatial context information by combining features of shallow and deep layers to account for fine and coarse structures, while the latter component leverages semantic labeling - the pixel-wise classification of an image - to introduce more semantic context information. Two different variants to integrate semantic labeling into the detection framework are realized: exploitation of the semantic labeling results to filter out unlikely predictions and inducing scene knowledge by explicitly merging the semantic labeling network into the detection framework via shared feature representations. Both components clearly reduce the number of false detections, resulting in considerably improved detection accuracies. To reduce the computational effort and consequently the inference time, two alternative strategies are developed in the context of this thesis. The first strategy is replacing the default CNN architecture used for feature extraction with a lightweight CNN architecture optimized with regard to vehicle detection in aerial imagery, while the latter strategy comprises a novel module to restrict the search area to areas of interest. The proposed strategies result in clearly reduced inference times for each component of the detection framework. Combining the proposed approaches significantly improves the detection performance compared to the standard Faster R-CNN detector taken as baseline. Furthermore, existing approaches for vehicle detection in aerial imagery, taken from the literature, are outperformed in quantitative and qualitative manner on different aerial imagery datasets. The generalization ability is further demonstrated on a large set of previously unseen data collected from novel aerial imagery datasets with differing properties.
Karlsruhe, Inst. für Technologie (KIT), Diss., 2020