Options
2026
Journal Article
Title
Multimodal and hyperspectral dataset for segmentation of bulky waste using VIS, IR, NIR, and terahertz imaging
Abstract
This study presents an annotated multi-sensor, multimodal, and hyperspectral dataset designed to support deep learning-based classification and segmentation of bulky waste. The dataset comprises four distinct sensor modalities: high-resolution visible RGB images (VIS), hyperspectral near-infrared (NIR), temporally resolved thermal infrared (IR), and terahertz (THz) imaging with depth information, providing complementary multimodal information. An image registration process aligns all modalities to a common reference frame, enabling near pixel-precise fusion across sensors. WoodVIT contains 56 registered multi-sensor scenes, partitioned into 22,659 annotated patches with two main classes (wood and non-wood) and 16 subclass labels. It includes pixel-masks and patch-wise annotations to facilitate both segmentation and classification tasks. The primary benchmark task is binary discrimination of wood versus non-wood. The dataset also includes challenging scenarios involving occlusion and concealed contaminants (e.g., embedded metals) to motivate robust multimodal fusion approaches. We provide predefined train/validation/test splits and report baseline results using convolutional neural networks and fusion architectures to establish reference performance. WoodVIT is publicly available to support research on multi-sensor learning for waste sorting.
Author(s)
Open Access
File(s)
Rights
CC BY 4.0: Creative Commons Attribution
Additional link
Language
English