Options
2025
Conference Paper
Title
A Comparative Evaluation of Vision Language Models for Waste Classification in Few-Shot Settings
Abstract
Efficient waste classification is essential for sustainable waste management systems. Accurate sorting can significantly enhance recycling efforts and reduce pollution. However, traditional computer vision methods often require large, annotated datasets and extensive retraining, limiting their adaptability to varying waste types and challenging real-world conditions. In this study, we evaluate the potential of Multimodal Large Language Models (MLLMs) and Vision-Language Models (VLMs) for adaptive waste classification, focusing on zero-shot and few-shot learning scenarios. Using datasets such as TrashNet and our custom MultiWaste dataset, we test a method using a CLIP VLM for feature extraction and a simple Nearest Neighbour (VLM-NN) approach for classification. This showcases robust few-shot capabilities and excellent scalability, achieving an accuracy of 97.74% on TrashNet. While MLLMs exhibit strong zeroshot capabilities, their utility diminishes with increasing labelled samples due to high computational costs. In contrast, VLM-NN offers efficient performance but struggles with extremely limited training data. Our results show the potential of Large Pretrained Models for the task of waste classification while providing guidance on which model architectures to consider for different amounts of training data.
Author(s)
File(s)
Rights
Use according to copyright law
Additional link
Language
English