• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. A Comparative Evaluation of Vision Language Models for Waste Classification in Few-Shot Settings
 
  • Details
  • Full
Options
2025
Conference Paper
Title

A Comparative Evaluation of Vision Language Models for Waste Classification in Few-Shot Settings

Abstract
Efficient waste classification is essential for sustainable waste management systems. Accurate sorting can significantly enhance recycling efforts and reduce pollution. However, traditional computer vision methods often require large, annotated datasets and extensive retraining, limiting their adaptability to varying waste types and challenging real-world conditions. In this study, we evaluate the potential of Multimodal Large Language Models (MLLMs) and Vision-Language Models (VLMs) for adaptive waste classification, focusing on zero-shot and few-shot learning scenarios. Using datasets such as TrashNet and our custom MultiWaste dataset, we test a method using a CLIP VLM for feature extraction and a simple Nearest Neighbour (VLM-NN) approach for classification. This showcases robust few-shot capabilities and excellent scalability, achieving an accuracy of 97.74% on TrashNet. While MLLMs exhibit strong zeroshot capabilities, their utility diminishes with increasing labelled samples due to high computational costs. In contrast, VLM-NN offers efficient performance but struggles with extremely limited training data. Our results show the potential of Large Pretrained Models for the task of waste classification while providing guidance on which model architectures to consider for different amounts of training data.
Author(s)
Funk, Jonas
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Bäcker, Paul
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Roming, Lukas  
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Josekutty, Jerardh
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Maier, Georg  
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Längle, Thomas  
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Mainwork
11th World Congress on Electrical Engineering and Computer Systems and Science, EECSS 2025. Proceedings  
Conference
World Congress on Electrical Engineering and Computer Systems and Science 2025  
International Conference on Machine Vision and Machine Learning 2025  
File(s)
Download (1.21 MB)
Rights
Use according to copyright law
DOI
10.11159/mvml25.113
10.24406/publica-6480
Additional link
Full text
Language
English
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Keyword(s)
  • Waste Classification

  • Multimodal Large Language Models

  • Vision Language Models

  • Zero-Shot Learning

  • FewShot Learning

  • CLIP

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024