Options
2025
Journal Article
Title
BrownViTNet: Hybrid CNN-Vision Transformer Model for the Classification of Brownfields in Aerial Imagery
Abstract
The identification of brownfield sites through satellite imagery is a crucial yet challenging classification problem in the field of remote sensing applications. In this study, we leverage aerial images sourced from Google Maps, Bing Maps, and national aerial and satellite imagery (DOP20), focusing on three distinct land use classes: active areas, construction sites, and brownfields, all matched across the same geographical coordinates. Our dataset initially includes images set against a 1000 × 1000 pixel blank canvas, often resulting in a significant portion of unused black space. To optimize this, images were cropped based on a threshold ensuring a minimum height and width of 400 pixels, resulting in a substantial reduction of the dataset by 42.8% for active areas, 65.93% for construction areas, and 75.86% for brownfield images. Given the reduced size of our usable dataset, we employed single-image super-resolution to enhance image quality and effectively double our dataset size as a augmentation part of training data. For model architecture, the initial four layers consist of a convolutional neural network (CNN), followed by intermediate layers using a vision transformer with a patch size of 16. This novel hybrid architecture of brownfield vision transformer network demonstrated impressive accuracy in the classification of brownfields from satellite imagery, presenting a significant advance in the use of machine learning for environmental monitoring and urban planning.
Author(s)