• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. A Survey on Metadata for Machine Learning Models and Datasets: Standards, Practices, and Harmonization Challenges
 
  • Details
  • Full
Options
2025
Conference Paper
Title

A Survey on Metadata for Machine Learning Models and Datasets: Standards, Practices, and Harmonization Challenges

Abstract
The growing availability of machine learning (ML) models, datasets, and related artifacts across platforms, such as Hugging Face, GitHub, and Zenodo, has amplified the need for structured and standardized metadata. However, metadata practices remain highly heterogeneous, differing in schema design, vocabulary usage, and semantic expressiveness, posing significant challenges for tasks such as representation, extraction, alignment, and integration. This fragmentation impedes the development of infrastructures that depend on machine-actionable metadata to support discovery, provenance tracking, or cross-platform interoperability. While metadata is also foundational to enabling FAIR (Findable, Accessible, Interoperable, and Reusable) principles in ML, there is a lack of consolidated understanding of how existing standards support interoperability and alignment across platforms. In this survey, we review and compare a range of general-purpose and ML-specific metadata standards, evaluating their suitability for cross-platform alignment, discoverability, extensibility, and interoperability. We assess these standards based on defined criteria and analyze their potential to support unified, FAIR-compliant metadata infrastructures for ML, laying the groundwork for scalable and interoperable tooling in future ML ecosystems.
Author(s)
Gesese, Genet-Asefa
FIZ Karlsruhe - Leibniz Institute for Information Infrastructure
Chen, Zongxiong
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Zoubia, Oussama
University of Cologne
Limani, Fidan
ZBW - Leibniz-Informationszentrum Wirtschaft
Silva, Kanishka
GESIS - Leibniz-Institute for the Social Sciences
Suryani, Muhammad Asif
GESIS - Leibniz-Institute for the Social Sciences
Zapilko, Benjamin
GESIS - Leibniz-Institute for the Social Sciences
Castro, Leyla Jael
ZB MED - Information Centre for Life Sciences
Ekaterina, Kutafina
University of Cologne
Solanki, Dhwani
ZB MED - Information Centre for Life Sciences
Fliegl, Heike
FIZ Karlsruhe - Leibniz Institute for Information Infrastructure
Schimmler, Sonja  
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Boukhers, Zeyd  
University of Cologne
Sack, Harald
Karlsruhe Institute of Technology
Mainwork
5th International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment, Sci-K 2025. Proceedings  
Project(s)
NFDI für Datenwissenschaften und Künstliche Intelligenz  
Funder
Deutsche Forschungsgemeinschaft -DFG-, Bonn  
Conference
International Workshop on Scientific Knowledge - Representation, Discovery, and Assessment 2025  
International Semantic Web Conference 2025  
Open Access
File(s)
Download (1.25 MB)
Rights
CC BY 4.0: Creative Commons Attribution
DOI
10.24406/publica-5924
Language
English
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS  
Keyword(s)
  • Metadata

  • Machine Learning

  • Datasets

  • FAIR

  • Research Artifacts Harmonization

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024