Exploring fusion techniques in U-Net and DeepLab V3 architectures for multi-modal land cover classification

Qiu, Kevin; Budde, Lina E.; Bulatov, Dimitri; Iwaszczuk, Dorota

doi:10.1117/12.2636144

2022

Conference Paper

Abstract

Many deep learning architectures exist for semantic segmentation. In this paper, their application to multi-modal remote sensing data is examined. Two well-known network architectures, U-Net and DeepLab V3+, developed originally for RGB image data, are modified to accept additional input channels, such as near infrared or depth information. In both networks, ResNet101 is used as the backbone, while data-preprocessing steps, including data augmentation, are identical. We compare both networks and experiment with different fusion techniques in U-Net and with hyper-parameters for weighting the input channels for fusion in DeepLab V3+. We also evaluate the effect of pre-training on RGB and non-RGB data. The results show a minimally better performance of the DeepLab V3+ model compared to U-Net, while for the certain classes, such as vehicles, U-Net yields a slightly superior accuracy.

Author(s)

Qiu, Kevin

Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB

Budde, Lina E.

Bulatov, Dimitri

Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB

Iwaszczuk, Dorota

Mainwork

Earth Resources and Environmental Remote Sensing/GIS Applications XIII

Conference

Conference "Earth Resources and Environmental Remote Sensing/GIS Applications" 2022

Options

Exploring fusion techniques in U-Net and DeepLab V3 architectures for multi-modal land cover classification