Evaluation of CNNs for land cover classification in high-resolution airborne images
Semantic land cover classification of satellite images or airborne images is becoming increasingly important for applications like urban planning, road net analysis or environmental monitoring. Sensor orientations or varying illumination make classification challenging. Depending on image source and classification task, it is not always easy to name the most discriminative features for a successful performance. To avoid feature selection, we transfer aspects of a feature-based classification approach to Convolutional Neural Networks (CNNs) which internally generate specific features. As land covering classes, we focus on buildings, roads, low (grass) and high vegetation (trees). Two different approaches will be analyzed: The first approach, using InceptionResNetV2, stems from networks used for image recognition. The second approach constitutes a fully convolutional neural network (DeepLabV3+) and is typically used for semantic image segmentation. Before processing, the image needs to be subdivided into tiles. This is necessary to make the data processible for the CNN, as well as for computational reasons. The tiles working with InceptionResNetV2 are derived from a superpixel segmentation. The tiles for working with DeepLabV3+ are overlapping tiles of a certain size. The advantages of this CNN is that its architecture enables to up-sample the classification result automatically and to produce a pixelwise labeling of the image content. As evaluation data for both approaches, we used the ISPRS benchmark of the city Vaihingen, Germany, containing true orthophotos and ground truth labeled for classification.