Options
2025
Conference Paper
Title
Assessment of Self-Supervised Learning Techniques for Few-Shot Classification of Joint Hyperspectral and LiDAR DSM Data
Abstract
Numerous engineering disciplines rely on highly detailed, up-to-date land cover maps for daily decision-making. Over the last decade, researchers have approached the land cover classification using supervised deep learning, which requires many labels per category. Recent literature indicates that labeling is costly, error-prone, and challenging to scale for the ever-growing availability of remote sensing data. Self-supervised learning emerged to learn feature representations on unlabeled datasets,
facilitating, for instance, the resolution of few-shot downstream tasks by leveraging previously acquired knowledge through transfer learning. Since highly-detailed, updatable maps often rely on the detection capabilities of hyperspectral and LiDAR-derived digital surface model data, it is essential to quantify the potential of recent self-supervised learning methods to learn multimodal representations that enable accurate few-shot classifications.
The current work addresses this challenge by comparing the representation learning ability of four modern self-supervised learning strategies. It first implements modality-specific encoders for individually handling hyperspectral and LiDAR-generated digital surface model data. Then, it couples each regarded method’s architecture on top of the encoders, building pseudo-Siamese networks whose learning objectives are specific to each strategy. Following self-supervised pre-training, this study employs multi-level feature fusion to integrate learned features from various depths, enhancing the discrimination capability of each method. Ultimately, it performs non-parametric classification using the k-nearest neighbour classifier to assign categories to joint features at test time. This study’s validation step uses two benchmark datasets for quality assessment. Extensive experiments determine that the SimSiam-based strategy learned the most discriminative features across the studied datasets to achieve consistent and accurate classifications using five different label quantities per class.
facilitating, for instance, the resolution of few-shot downstream tasks by leveraging previously acquired knowledge through transfer learning. Since highly-detailed, updatable maps often rely on the detection capabilities of hyperspectral and LiDAR-derived digital surface model data, it is essential to quantify the potential of recent self-supervised learning methods to learn multimodal representations that enable accurate few-shot classifications.
The current work addresses this challenge by comparing the representation learning ability of four modern self-supervised learning strategies. It first implements modality-specific encoders for individually handling hyperspectral and LiDAR-generated digital surface model data. Then, it couples each regarded method’s architecture on top of the encoders, building pseudo-Siamese networks whose learning objectives are specific to each strategy. Following self-supervised pre-training, this study employs multi-level feature fusion to integrate learned features from various depths, enhancing the discrimination capability of each method. Ultimately, it performs non-parametric classification using the k-nearest neighbour classifier to assign categories to joint features at test time. This study’s validation step uses two benchmark datasets for quality assessment. Extensive experiments determine that the SimSiam-based strategy learned the most discriminative features across the studied datasets to achieve consistent and accurate classifications using five different label quantities per class.