• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. Enhancing Machine Learning Capabilities in Data Lakes with AutoML and LLMs
 
  • Details
  • Full
Options
2024
Conference Paper
Title

Enhancing Machine Learning Capabilities in Data Lakes with AutoML and LLMs

Abstract
The exponential growth of data from digitization requires efficient utilization and storage of large amounts of data. Data lakes can store heterogeneous datasets and prepare them for machine learning (ML). However, current data lakes lack mature capabilities to support ML requirements. AutoML is the process of automating the end-to-end application of ML to real-world problems. Large Language Models (LLMs) can potentially increase ML pipeline automation by assisting at various stages of the process and democratizing access to advanced analytics. This paper explores the integration of AutoML tools and LLMs and their application in the data lake SEDAR. We present an extended data lake metadata model for capturing data analytics, a Python package for wrapping AutoML libraries, and a module that leverages LLMs for AutoML. Finally, we undertake a comparative analysis between the performance of AutoML and LLMs in four challenging real-world use cases from the domain of chemistry, each presenting a distinct type of ML problem.
Author(s)
Hoseini, Sayed
Ibbels, Maximilian
Quix, Christoph  
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Mainwork
Advances in Databases and Information Systems  
Conference
European Conference on Advances in Databases and Information Systems 2024  
DOI
10.1007/978-3-031-70626-4_13
Language
English
Fraunhofer-Institut für Angewandte Informationstechnik FIT  
Keyword(s)
  • AutoML

  • Data Lakes

  • LLMs

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024