Nachhaltigkeitspotentialanalyse für die Zweckmäßigkeit und den Aufwand von Datenannotationen für Machine Learning (ML)-Modelle. Abschlussbericht

Catalli, Flaminia; Hochenegger, Franziska; Reitz, Thorsten; Klien, Eva; Kocon, Kevin; Krämer, Michel

doi:10.60810/openumwelt-7930

2026

Report

Abstract

Die digitale Transformation führt dazu, dass Daten zunehmend als wertvolle Ressource betrachtet werden, insbesondere im Umweltbereich. Das Projekt LabelledGreenData4All untersucht die strategische Bedeutung annotierter Umweltdaten für den Einsatz von Machine Learning (ML) und im weiteren Sinne Künstlicher Intelligenz (KI) zur Bewältigung gesellschafts- und umweltpolitischer Herausforderungen. Ziel war es, Anwendungsbereiche mit hohem Potenzial für ML-Modelle zu identifizieren und ein Vorgehensmodell zu entwickeln, welches dazu dienen soll, Datenannotationen auf Basis definierter Entscheidungspfade und unter Berücksichtigung verschiedener Ausgangssituationen hinsichtlich der Verfügbarkeit annotierter Trainingsdaten vorzunehmen. Die Annotationen selbst sollten dabei (ressourcen-)effizient durchgeführt werden. Zusätzlich sollten strategische und politische Empfehlungen für die sektorübergreifende Bereitstellung von Umweltdaten als Grundlage für künftige Fördermaßnahmen entwickelt werden.
Methodisch wurde ein explorativer Forschungsansatz gewählt, bestehend aus einem systematischen Review zum aktuellen Stand von Wissenschaft und Technik, einer Bedarfserhebung durch Umfragen und Stakeholder-Workshops sowie der prototypischen Umsetzung eines Entscheidungsmodells für die Datenannotation.
Die Potential- und Wirkungsanalyse zeigt, dass eine verbesserte Datenverfügbarkeit und Standardisierung essenziell für die effektive Nutzung von KI im Umweltsektor sind. Herausforderungen bestehen insbesondere in der Datenqualität, Interoperabilität und rechtlichen Rahmenbedingungen. Die Potentialanalyse unterstreicht, dass KI erhebliche Vorteile für das Umweltmanagement, politische Entscheidungsprozesse und nachhaltige Wirtschaftsstrategien bietet. Gleichzeitig bestehen ökologische Risiken, darunter ein hoher Energie- und Wasserverbrauch, die zunehmende Erzeugung von Elektroschrott, sowie ethische Risiken, welche die Förderung sozialer Ungleichheiten verstärken können.
Die erarbeiteten Handlungsempfehlungen für das Teilen von annotierten Umweltdaten beinhalten unter anderem die Förderung der FAIR-Prinzipien, die Standardisierung von Datenformaten, die Etablierung von Datenräumen und Datentreuhändern als digitale Ökosysteme und Anreizsysteme für die Datenbereitstellung.
Das Vorgehensmodell adressiert insbesondere die Entwicklung und Nutzung von ML-Anwendungen bei begrenzt verfügbaren annotierten Daten, wobei Pseudo-Labeling und Transfer Learning im Rahmen der Prototypisierung als besonders vielversprechend identifiziert wurden.
Die Ergebnisse tragen insgesamt dazu bei, KI-Anwendungen im Umweltbereich effizient und nachhaltig zu gestalten und umweltpolitische Maßnahmen datenbasiert zu unterstützen.

;

The digital transformation means that data is increasingly seen as a valuable resource, especially in the environmental sector. The LabelledGreenData4All project investigates the strategic importance of annotated environmental data for the use of machine learning (ML) and, in a broader sense, artificial intelligence (AI) to tackle social and environmental challenges. The aim was to identify areas of application with high potential for ML models and to develop a process model that is intended to be used to carry out data annotations on the basis of defined decision paths and taking into account different initial situations with regard to the availability of annotated training data. The annotations themselves should be carried out in a (resource) efficient manner. In addition, strategic and political recommendations for the cross-sectoral provision of environmental data were to be developed as a basis for future funding measures.
Methodologically, an explorative research approach was chosen, consisting of a systematic review of the current state of science and technology, a needs assessment through surveys and stakeholder workshops, and the prototypical implementation of a decision model for data annotation.
The potential and impact analysis shows that improved data availability and standardization are essential for the effective use of AI in the environmental sector. Challenges exist in particular in terms of data quality, interoperability and legal framework conditions. The potential analysis underlines that AI offers considerable advantages for environmental management, political decision-making processes and sustainable economic strategies. At the same time, there are ecological risks, including high energy and water consumption, the increasing generation of electronic waste, as well as ethical risks, which can reinforce the promotion of social inequalities.
The recommendations for the sharing of annotated environmental data include the promotion of FAIR principles, the standardization of data formats, the establishment of data spaces and data trustees as digital ecosystems and incentive systems for data provision.
The process model particularly addresses the development and use of ML applications with limited available annotated data, whereby pseudo-labeling and transfer learning were identified as particularly promising in the context of prototyping.
Overall, the results contribute to the efficient and sustainable design of AI applications in the environmental sector and provide data-based support for environmental policy measures.