• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Artikel
  4. Evaluation of vision-language models for waste classification: Zero- and few-shot with a training-free tip-adapter
 
  • Details
  • Full
Options
2026
Journal Article
Title

Evaluation of vision-language models for waste classification: Zero- and few-shot with a training-free tip-adapter

Abstract
Accurate waste classification is essential for effective recycling. For household waste, changing recycling policies and the introduction of new materials in consumer products continually reshape sorting categories across regions and over time. These changes challenge existing sensor-based sorting systems trained for a specific classification task, and restoring accuracy with fully supervised retraining is costly because it requires new labeled data and full model updates. We investigate whether foundation models can deliver accurate image-based waste classification with no or few labeled examples. We analyze existing methods such as multimodal large language models (MLLMs), vision–language models (VLMs), Vision Transformers (ViTs), and a baseline CNN across four datasets, including a new food vs. non-food packaging dataset, with varying numbers of labeled examples. Further, we propose adaptations to the existing approaches, such as chain-of-thought prompting for MLLMs, and ensemble prompting and a Tip-Adapter for VLMs. We show that MLLMs perform well on zero-shot classification and larger models like GPT-4o further improve with few examples, but at infeasible computational cost for industry-scale inference. For a comparably faster way of zero-shot classification, we show that VLMs yield an accuracy of 90.4% on TrashNet, by contrast, a CNN typically needs a few hundred labeled images to achieve similar performance. Using the training-free Tip-Adapter with only 10 labeled example images per class lifts macro-F1 by 8.1 points over the zero-shot VLM baseline. Overall, we propose a guideline for language-driven, training-free methods for waste classification.
Author(s)
Funk, Jonas
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Bäcker, Paul
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Roming, Lukas  
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Josekutty, Jerardh
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Maier, Georg  
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Längle, Thomas  
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Journal
Cleaner waste systems  
Open Access
File(s)
Download (2.69 MB)
Rights
CC BY 4.0: Creative Commons Attribution
DOI
10.1016/j.clwas.2026.100475
10.24406/publica-7273
Additional link
Full text
Language
English
Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung IOSB  
Keyword(s)
  • Circular economy

  • Smart waste bins

  • CLIP

  • RealWaste

  • Municipal solid waste

  • Material recovery facility

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024